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Summary 


Asbestos,  a  new  operating  system  we  have  prototyped,  provides  novel  labeling 
mechanisms  that  makes  it  easy  for  programmers  to  develop  secure  software.  Asbestos 
labels  let  applications  be  structured  so  as  to  tolerate  flaws  in  major  parts  of  the 
application  without  compromising  security.  Moreover,  a  new  label  save/restore 
mechanism  pushes  beyond  the  traditional  process  abstraction  to  avoid  continually 
accumulating  restrictions  when  manipulating  data  in  different  compartments.  For 
example,  a  Web  server  that  uses  Asbestos  labels  to  implement  mandatory  controls  on 
client  data  requires  only  one  virtual  memory  page  per  user. 


Introduction 

Today's  software  systems  are  insecure:  floods  of  security  advisories  from  vendors  and 
security  organizations  document  a  steady  stream  of  high-profile  vulnerabilities  in  end- 
user  applications,  operating  systems,  and  even  routers,  leading  to  widespread  attacks, 
often  estimated  to  incur  billions  of  dollars  in  direct  and  indirect  costs. 

Many  vulnerabilities  stem  from  conflicts  between  the  needs  of  application  developers  and 
the  basic  principles  for  building  secure  computer  systems,  such  as  giving  applications 
minimal  privilege.  Violating  the  principles— by  assuming  elevated  privilege,  for  example- 
-makes  development  so  much  easier  on  conventional  operating  systems  that  it's  doubtful 
the  principles  will  ever  be  broadly  followed  there. 

Because  one  cannot  hope  to  fix  or  even  understand  all  the  software  people  need  to  run, 
securing  systems  boils  down  to  the  problem  of  reasoning  about  the  behavior  of  large 
amounts  of  software  without  necessarily  understanding  the  software  itself,  a  task  that 
might  be  accomplished  by  redefining  the  interface  between  software  and  the  underlying 
operating  system. 

The  goal  of  this  project  was  to  design  a  new  operating  system,  called  Asbestos,  that 
would  make  it  easy  to  control,  understand,  and  observe  interactions  among  applications 
without  understanding  the  applications  themselves.  This  approach  enables  people  to 
monitor  and  control  systems  and  enforce  a  wide  range  of  security  policies.  A  policy  may 
be  a  set  of  static  constraints  enforced  by  the  system  or,  more  generally,  a  program  that 
imposes  a  narrow  interface  for  exporting  information  or  accessing  other  parts  of  the 
system.  Such  a  system  could  apply  towards  platforms  ranging  from  resource-poor 
sensors  to  high-end  servers,  to  help  enforce  meaningful  security  policies  on  new 
applications  or  privilege-hungry  imported  legacy  code. 

The  scope  of  the  project  was  to  flesh  out  the  design  of  Asbestos  and  to  implement  a 
skeleton  prototype  to  explore  key  design  decisions.  The  design  work  focused  on 
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enforcing  both  discretionary  and  mandatory  access  control  with  Asbestos,  and  on 
exploring  example  applications.  The  detailed  results  are  documented  in  Appendixes 
A  and  B. 

This  project  was  a  short-term  seedling  effort,  which  we  hoped  would  lead  to  a  larger 
follow-on  project  for  Asbestos.  Indeed,  the  most  important  result  is  that  this  project 
helped  lay  the  groundwork  for  the  following  major  implementation  effort,  jointly  funded 
by  DARPA  and  an  NSF  Cybertrust  grant  (CNS-0430425). 


Methods,  Assumptions,  and  Procedures 

Asbestos  is  organized  around  the  idea  of  exposing  and  controlling  messages.  Every 
interaction  between  an  application  and  the  system  or  another  application  is  represented  as 
a  message.  The  key  problems  the  operating  system  faces  are  ensuring  that  messages 
encapsulate  all  interactions,  understanding  what  parts  of  the  system  may  be  observed  or 
modified  with  a  given  message,  and  most  importantly  making  the  mechanisms  that 
control  interaction  available  to  unprivileged  software,  so  that  security  policies  need  not 
all  be  imposed  from  highly  privileged  code. 

The  Asbestos  design  solves  these  problems  using  a  novel  label  scheme  that  can  be 
viewed  as  an  extension  of  capabilities  to  provide  decentralized  Mandatory  Access 
Control  (MAC).  Previously,  capability-based  systems  have  only  achieved  MAC  by 
either  marrying  two  very  different  security  mechanisms  (such  as  capabilities  and  a 
completely  separate  labeling  system),  or  by  granting  processes  completely  disjoint  sets 
of  capabilities  so  as  to  achieve  heavy-weight  isolation. 

Our  labeling  scheme  furthermore  has  the  advantage  of  being  decentralized— so  that  even 
unprivileged  processes  can  make  use  of  the  Operating  System's  (OS)  mandatory  access 
control  primitives  to  control  the  flow  of  information.  In  traditional  tenns,  unprivileged 
application  can  on-the-fly  create  compartments  that  protect  the  secrecy  (or  integrity)  of 
data  and  processes  they  contain.  The  process  that  creates  a  compartment  controls  what 
information  can  leave  the  compartment.  This  ability— to  downgrade  information  within  a 
compartment— is  equivalent  to  possessing  a  capability  in  a  traditional  discretionary 
capability  system.  The  system  can  be  used  in  a  degenerate  way,  in  which  the  ability  to 
downgrade  is  simply  delegated  around  and  conveys  the  ability  to  invoke  services  within  a 
compartment.  However,  applications  developed  for  the  capability  model  can  have 
mandatory  access  control  imposed  by  other  compartments. 

Another  limitation  of  previous  operating  systems  is  that  the  granularity  of  compartments 
is  too  coarse.  We  designed  a  novel  virtual  memory  system  that  can  control  the  flow  of 
information,  even  within  a  single  process.  We  identified  a  target  application  of  a  possibly 
buggy  web  server  handling  sensitive  information. 
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Results  and  Discussion 


The  result  of  this  seedling  is  that  it  has  grown  into  a  full-fledged  OS  development  project. 
The  applications  we  investigated — in  particular,  protecting  information  in  web  servers— 
turn  out  to  be  both  important  and  hard  to  address  with  existing  operating  systems. 
Moreover,  the  initial  success  of  our  follow-on  DARPA/NSF  project  suggests  the 
approach  may  be  effective  and  practical. 

More  specifically  we  have  prototyped  both  the  Asbestos  kernel,  and  a  web  server  running 
on  top  of  Asbestos.  The  web  server  is  designed  around  special  Asbestos  virtual  memory 
system,  which  allows  labels  to  be  applied  at  the  granularity  of  individual  pages,  so  that 
one  can  control  the  flow  of  information  even  within  one  process.  Thus,  even  software 
bugs  in  the  web  server  cannot  cause  one  user  to  receive  another's  private  data.  The 
system  requires  only  one  or  two  pages  of  memory  per  active  user— immensely  less  than 
traditional  operating  systems,  which  would  require  one  process  per  compartment.  The 
details  are  in  Appendixes  A  and  B. 


Related  work 

A  number  of  previous  systems  have  individually  provided  either  the  ability  to  isolate 
untrusted  software,  labels,  low-level  interposition  facilities,  or  consistent  intelligible 
interfaces  to  different  types  of  resource. 

Message-based  operating  systems,  such  as  Accent,  Amoeba,  Chorus,  L4,  Spring,  and  V 
can  isolate  system  services  by  running  them  as  independent,  user-level  processes,  and 
provide  natural  support  for  interposition  through  message-based  interfaces.  However, 
none  of  these  systems  can  provide  the  combined  security  and  flexibility  of  Asbestos.  For 
example,  Amoeba  bases  access  control  on  self-authenticating  capabilities,  precluding 
policies  that  restrict  delegation.  L4  uses  a  strict  hierarchy  of  interpositions,  useful  for 
confining  executable  content,  but  not  amenable  to  composition  of  independent 
restrictions  on  infonnation  flow  imposed  by  mutually  distrustful  parties. 


Conclusions 

Asbestos  is  a  new  operating  system  with  a  labeling  mechanism  that  promises  to  enforce 
security  properties  on  applications  without  needing  to  trust  the  bulk  of  the  software 
running  on  a  system.  Asbestos  labels  are  in  some  ways  similar  to  previous  operating 
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systems  with  mandatory  access  control,  but  with  several  novel  properties,  including  the 
ability  for  unprivileged  software  to  create  compartments  on-the-fly,  decentralized  control 
over  sanitization  of  data,  and  the  ability  to  revert  the  state  of  a  process,  so  as  to  let 
software  safely  messages  bearing  data  from  multiple  compartments.  We  hope  that  these 
properties  will  allow  Asbestos  to  apply  mandatory  access  control  to  a  wider  range  of 
problems  than  in  previous  systems,  and  furthermore  avoid  the  notorious  problem  of 
"accumulating  taints"  on  processes.  The  initial  prototype  shows  promise,  and  the  seedling 
has  led  into  a  full-fledged  implementation  effort  co-sponsored  by  DARPA  and  NSF. 
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Appendix  B  -  Petros  Efstathopoulos,  Maxwell  Krohn,  Steve  VanDeBogart,  Cliff  Frey, 
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Abstract 

Though  system  security  would  benefit  if  programmers 
routinely  followed  the  principle  of  least  privilege  [24], 
the  interfaces  exposed  by  operating  systems  often  stand 
in  the  way.  We  investigate  why  modern  OSes  thwart  se¬ 
cure  programming  practices  and  propose  solutions. 

1  Introduction 

Though  many  software  developers  simultaneously  revere 
and  ignore  the  principles  of  their  craft,  they  reserve  spe¬ 
cial  sanctimony  for  the  principle  of  least  privilege,  or 
POLP  [24],  All  programmers  agree  in  theory:  an  ap¬ 
plication  should  have  the  minimal  privilege  needed  to 
perform  its  task.  At  the  very  least,  developers  must  fol¬ 
low  five  POLP  requirements:  (1)  split  applications  into 
smaller  protection  domains,  or  “compartments”;  (2)  as¬ 
sign  exactly  the  right  privileges  to  each  compartment;  (3) 
engineer  communication  channels  between  the  compart¬ 
ments;  (4)  ensure  that,  save  for  intended  communication, 
the  compartments  remain  isolated  from  one  another;  and 
(5)  make  it  easy  for  themselves,  and  others,  to  perform  a 
security  audit. 

Unfortunately,  modern  operating  systems  render  the 
application  of  these  requirements  onerous,  dangerous,  or 
impossible.  In  our  experience  (detailed  in  Section  2.2), 
building  least-privileged  software  is  cumbersome  and 
labor-intensive:  following  POLP  feels  more  like  an  abuse 
of  the  operating  system’s  interface  than  ajudicious  use  of 
its  features.  Most  programmers  spare  themselves  these 
difficulties  by  reverting  to  monolithic,  over-privileged 
application  designs.  Unsurprisingly,  this  exposes  ma¬ 
chines  to  attacks  both  old  (remote  attacks  on  privileged 
servers)  and  new  (“install  attacks”,  which  take  advan¬ 
tage  of  users’  willingness  to  run  high-privilege  installers 
to  infect  machines  with  adware,  spyware,  or  malware). 
We  cannot  write  bug-free  applications  or  prevent  hon¬ 
est  users  from  occasionally  executing  malicious  code.  In¬ 
stead,  our  best  hope  is  to  contain  the  damage  of  evil  code 
by  resurrecting  POLP. 

In  this  paper,  we  examine  some  ways  that  current 
OSes  discourage  development  of  least-privilege  appli¬ 
cations  (Section  2),  then  propose  OS  design  ideas  that 
might  encourage  it  instead.  A  first  approximation  of  a 
POLP-friendly  system  is  one  based  on  capabilities,  dis¬ 
cussed  in  Section  3.  Though  capabilities  have  historically 
flummoxed  application  designers,  we  present  a  more  us¬ 
able  interface,  based  on  the  familiar  Unix  file  system.  In 
Section  4,  we  discuss  shortcomings  in  this  proposed  de¬ 
sign:  weaknesses  in  the  separated  system  might  still  al¬ 


low  vulnerabilities  to  spread,  and  process-sized  compart¬ 
ments  are  too  coarse-grained.  We  then  propose  a  solution 
based  on  decentralized  mandatory  access  control  [17], 
The  end  result  is  a  new  operating  system  called  Asbestos. 

2  Lessons  From  Current  Systems 

Modern  Unix-like  operating  systems  provide  a  limited 
API  for  running  programs  according  to  POLP.  We  ex¬ 
amine  how  far  administrators  and  programmers  can  push 
these  features  if  POLP  is  their  goal. 

2.1  chrooting  or  j  a  i  ling  Greedy  Applications 

Because  Unix  grants  privilege  with  coarse  granularity, 
many  Unix  applications  acquire  more  privileges  than 
they  require.  These  “greedy  applications”  can  be  tamed 
with  the  chroot  or  jail  system  calls.  Both  calls  con¬ 
fine  applications  to  jails,  areas  of  the  file  system  that 
administrators  can  configure  to  exclude  setuid  executa¬ 
bles  and  sensitive  files.  FreeBSD’s  jail  goes  further, 
restricting  a  process’s  use  of  the  network  and  interpro¬ 
cess  communication  (IPC).  System  administrators  with 
enough  patience  and  expertise  can  chroot  or  jail 
standard  servers  such  as  Apache  [1],  BIND  [3]  and  send- 
mail  [26],  though  the  process  resembles  stuffing  an  ele¬ 
phant  into  a  taxicab. 

Even  when  possible,  the  chroot  and  jail  ap¬ 
proaches  face  more  fundamental  drawbacks: 

Jails  are  heavyweight.  The  jailed  file  system  must 
contain  copies  of  system-wide  configuration  files  (such 
as  resolv .  conf ),  shared  libraries,  the  run-time  linker, 
helper  executable  files,  and  so  on.  Maintaining  collec¬ 
tions  of  duplicated  files  is  an  administrative  difficulty, 
especially  on  systems  with  many  jailed  applications. 

Jails  are  coarse-grained.  Running  a  process  in  a 
jail  is  similar  to  running  it  on  its  own  virtual  machine. 
Two  jailed  applications  can  share  files  only  if  one’s 
namespace  is  a  superset  of  the  other,  or  if  inefficient 
workarounds  are  used,  such  as  NFS-mounting  a  local  file 
system. 

Jails  require  privilege.  Unprivileged  users  may  not 
call  chroot  or  jail.1  Jails  are  therefore  ill-suited  for 
containing  the  many  untrusted  applications  that  should 
not  have  privileges,  such  as  executable  email  attachments 
or  browser  plugins. 

Finally,  chroot  or  j  ail’s  ex  post  facto  imposition 
of  security  is  no  substitute  for  POLP-based  design.  For 
example,  a  typical  dynamic  content  Web  server  (such  as 
Apache  with  PHP  [18])  runs  many  logically  unrelated 
scripts  within  the  same  address  space.  A  vulnerability  in 
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Figure  1:  Block  diagram  of  the  OKWS  system.  Standard  processes  are 
shaded,  while  site-specific  services  and  databases  are  shown  in  white. 
The  privileged  launcher  process  launches  the  demux ,  publisher,  log¬ 
ger  and  the  site-specific  services.  The  databases  shown  might  either  be 
running  locally,  or  on  different  machines. 

any  one  script  exposes  all  other  scripts  to  attack,  regard¬ 
less  of  whether  the  server  is  jailed. 

2.2  Ad-Hoc  Privilege  Separation 

True  privilege  separation  is  possible  on  Unix  through  a 
collection  of  ad-hoc  techniques.  For  instance,  our  POLP- 
based  OK  Web  Server  (OKWS)  [12]  uses  a  pool  of 
worker  processes  to  sequester  each  logical  function  (i.e. 
/show-inbox,  /change-pw,  and  /search)  of  the 
site  into  its  own  address  space.  The  demux,  a  small,  un¬ 
privileged  process,  accepts  incoming  HTTP  requests,  an¬ 
alyzes  their  first  lines,  and  forwards  them  to  the  appropri¬ 
ate  workers  using  file  descriptor  passing.  Workers  then 
respond  to  clients  directly.  A  privileged  launcher  pro¬ 
cess  starts  the  suite  of  processes,  ensuring  that  all  are 
jailed  into  empty  subtrees  of  the  file  system,  and  that  they 
do  not  have  the  privileges  to  interact  with  one  another. 
Finally,  since  workers’  chroot  environments  prohibit 
them  from  accessing  the  root  file  system  directly,  they 
write  HTTP  log  entries  and  read  static  HTML  content 
via  small,  unprivileged  helper  processes:  the  logger  and 
the  publisher,  respectively.  Figure  1  shows  a  block  dia¬ 
gram  of  a  simple  OKWS  configuration. 

The  goal  of  this  design  is  to  separate  application  logic 
into  disjoint  compartments,  so  that  any  local  vulnera¬ 
bility  (especially  in  site-specific  work  processes)  can¬ 
not  spread.  In  particular,  workers  cannot  send  each  other 
signals  or  trace  each  other’s  system  calls,  they  cannot 
access  each  other’s  databases,  no  worker  can  alter  any 
executable  or  library,  and  workers  cannot  access  each 
other’s  coredumps.  Unfortunately,  achieving  these  natu¬ 
ral  requirements  complicates  OKWS.  Its  launcher  must: 

1.  Establish  a  chroot  environment,  with  the  correct 
file  system  permissions,  that  contains  the  appro¬ 


priate  shared  libraries,  configuration  files,  run-time 
linker,  and  worker  executables. 

2.  Obtain  unused  UID  and  GID  ranges  on  the  system. 

3.  Assign  the  ith  worker  its  own  UID  m,  and  GID  g, . 

4.  Allocate  a  writable  coredump  directory  for  each 
UID. 

5.  Change  the  ith  worker’s  executable  to  have  owner 
root,  group  g,,  and  access  mode  0410. 

6.  Call  chroot. 

7.  For  each  worker  process  i:  kill  all  processes  running 
as  user  m,  or  group  ID  gg  fork;  change  user  ID  to  w, 
and  group  ID  to  gg  chdir  into  the  dedicated  dump 
directory;  and  call  exec  on  the  correct  executable. 

The  chown  call  in  Step  5,  the  chroot  call  in  Step  6, 
and  the  setuid  call  in  Step  7  all  require  privileged  sys¬ 
tem  access,  so  the  launcher  must  run  as  root.  Unix  offers 
no  guarantees  of  an  atomic  UID  reservation  (as  required 
in  Step  2)  or  race-free  file  system  permission  manipula¬ 
tions  (as  required  throughout).  Even  ignoring  these  po¬ 
tential  security  problems,  this  design  requires  involved 
IPC  to  coordinate  worker  and  helper  processes. 

Other  systems  use  similar  techniques  to  solve  related 
problems.  Examples  include  remote  execution  utilities 
such  as  OpenSSH  [23]  and  REX  [10],  and  mail  transfer 
agents  such  as  qmail  [2]  and  postfix  [21],  Considering 
these  applications  and  others,  a  trend  emerges:  in  each 
instance,  the  intricate  mechanics  of  privilege  separation 
are  invented  anew.  To  audit  the  exact  security  procedures 
of  these  applications,  one  must  comb  tens  of  thousands 
of  lines  of  code,  each  time  learning  a  new  system.  Even 
automated  tools  that  separate  privileged  operations  [5] 
require  root  access. 

2.3  A  User-Level  POLP  Library? 

At  first  glance,  a  user-level  POLP  library  might  seem 
able  to  abstract  the  security-related  specifics  of  appli¬ 
cations  like  OKWS,  qmail,  and  so  on.  One  such  ex¬ 
ample  of  this  approach  is  found  in  the  Polaris  system 
for  Windows  XP  [30],  which  applies  POLP  to  virus- 
prone  client  applications  like  Web  browsers  and  spread¬ 
sheets2  via  chroot-like  containers.  Such  solutions  have 
three  drawbacks.  First,  they  require  privileged  access 
to  the  system.  Second,  libraries  must  work  around  the 
lack  of  good  OS  support  for  sharing  across  containers: 
since  jailed  processes  work  with  copies  of  files,  synchro¬ 
nization  schemes  are  required  to  reconcile  copies  after 
changes.  (For  example,  Polaris  email  plug-ins  run  in  a 
jail  with  a  copy  of  the  attachment;  a  persistent  “synchro¬ 
nizer”  process  updates  the  original  if  the  plug-in  changes 
the  copy.)  Finally,  we  suspect  that  POLP  techniques  used 
in  more  complicated  servers  such  as  OKWS  do  not  gen¬ 
eralize  well.  As  evidence,  both  OKWS  and  REX,  an 
ssh-like  login  facility,  use  the  same  libraries  (the  SFS 
toolkit  [16])  but  share  little  security-related  code.  This 
comes  as  no  surprise  since  the  two  have  very  different  se- 
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curity  aims:  OKWS  hides  most  of  the  file  system,  while 
REX  exposes  it  to  authorized  users;  OKWS  must  support 
millions  of  possible  users,  while  REX  serves  only  those 
with  login  access  to  a  given  machine;  application  design¬ 
ers  can  extend  OKWS  with  site-specific  code,  while  REX 
runs  unmodified.  Fitting  both  POLP  usages  into  one  gen¬ 
eral  template  seems  a  tall  order. 

2.4  Unix  as  a  Capability  System 

One  of  the  main  difficulties  with  ad-hoc  privilege  sepa¬ 
ration  is  that  starting  with  a  privileged  process  and  sub¬ 
tracting  privileges  is  more  cumbersome  and  error-prone 
than  starting  with  a  totally  unprivileged  process  and 
adding  privileges.  Unix-like  operating  systems  in  general 
favor  the  subtractive  model,  while  capability-based  oper¬ 
ating  systems  [4,  28]  favor  the  additive  one.  But  Unix  file 
descriptors  are  in  fact  capabilities.  By  hobbling  system 
calls  sufficiently — either  through  system  call  interposi¬ 
tion  [7,  22]  or  small  kernel  modifications — we  can  em¬ 
ulate  those  semantics  of  capability-based  operating  sys¬ 
tems  that  enable  privilege  separation. 

The  idea  is  to  allow  calls  that  use  already-opened  file 
descriptors  (such  as  read,  write,  and  mmap),  but  shut 
off  all  “sensitive”  system  calls,  including  those  that  cre¬ 
ate  new  capabilities  (such  as  open),  assign  capabilities 
control  of  named  resources  (such  as  bind),  and  per¬ 
form  file  system  modifications,  permissions  changes,  or 
IPC  without  capabilities  (such  as  chown,  setuid,  or 
ptrace).  In  OKWS,  the  launcher  could  apply  such  a 
policy  to  the  worker  processes,  which  only  require  ac¬ 
cess  to  inherited  or  passed  file  descriptors.  The  launcher 
could  run  without  privilege,  and  would  no  longer  nav¬ 
igate  the  system  call  sequence  seen  in  Section  2.2.  By 
disabling  all  unneeded  privileges,  the  operating  system 
could  enforce  privilege  separation  by  default. 

This  works  because  Unix’s  capability-like  system 
calls  are  virtualizable.  Processes  are  usually  indifferent 
to  whether  a  file  descriptor  is  a  regular  file,  a  pipe  to  an¬ 
other  process,  or  a  TCP  socket,  since  the  same  read  and 
write  calls  work  in  all  three  cases.  In  practical  terms, 
virtualization  simplifies  POLP-based  application  design. 
Splitting  a  system  into  multiple  processes  often  involves 
substituting  user-space  helper  applications  for  kernel  ser¬ 
vices;  for  instance,  OKWS  services  write  log  entries  to 
the  logger  instead  of  a  Unix  file.  With  virtualizable  sys¬ 
tem  calls,  user  processes  can  mimic  the  kernel’s  inter¬ 
face;  programmers  need  not  rewrite  applications  when 
they  choose  to  reassign  the  kernel’s  role  to  a  process. 

More  important,  virtualizable  system  calls  enable  in¬ 
terposition.  If  an  untrustworthy  process  asks  for  a  sen¬ 
sitive  capability,  a  skeptical  operator  can  babysit  it  by 
handing  it  a  pipe  to  an  interposer  instead.  The  interposer 
allows  harmless  queries  and  rejects  those  that  involve 
sensitive  information.  If  the  kernel  API  is  virtualizable, 
then  the  operator  need  not  even  recompile  the  untrust¬ 
worthy  process  to  interpose  on  it. 

Unfortunately,  most  Unix  system  calls  resist  virtual¬ 


ization.  Some  do  not  involve  any  capability-like  objects; 
others  use  hard-wired  capabilities  hidden  in  the  kernel, 
such  as  “current  working  directory”  and  “file  system 
root”.  User-level  emulation  of  these  problematic  calls — 
which  include  open — is  messy,  if  not  impossible;  but 
scrapping  open  in  the  name  of  POLP  seems  unlikely  to 
compel  the  average  programmer. 

3  Operating  System  Support  for  POLP 

With  the  lessons  from  Unix,  we  can  now  imagine  a 
POLP-friendly  operating  system  interface,  one  in  which 
all  system  calls  are  capability-based  and  virtualizable 
like  read  and  write.  Adding  universal  virtualization 
support  to  a  Unix-like  capability  system  would  cover  all 
five  POLP  requirements.  With  capabilities,  application 
programmers  can  split  their  program  into  isolated  com¬ 
partments  (#1  and  #4),  granting  each  compartment  ex¬ 
actly  the  privileges  necessary  to  complete  its  task  (#2). 
With  virtualization,  programmers  use  standard  interfaces 
and  libraries  for  communication  between  these  compart¬ 
ments  (#3),  and  auditors  can  understand  this  communica¬ 
tion  by  interposing  at  the  interfaces  (#5).  A  new  take  on 
capabilities — one  whose  Unix-like  appearance  would  be 
friendlier  to  application  programmers — could  simplify 
the  application  of  POLP.  This  section  presents  a  hypo¬ 
thetical  design  for  such  a  system,  which  we’ll  call  Asnix. 

3.1  Asnix  Design 

In  Asnix,  interactions  between  a  process  and  other  parts 
of  the  system  take  the  form  of  messages  sent  to  devices. 
Devices  include  processes  and  system  services  as  well 
as  hardware  drivers.  Messages  follow  the  outline  “per¬ 
form  operation  O  on  capability  C,  and  send  any  reply 
to  capability  P.”  The  kernel  forwards  this  message  to 
the  device  that  originally  issued  C.  There  are  a  small 
number  of  operation  types,  as  in  NFS  [25]  and  Plan  9’s 
9P  [19]:  LOOKUP,  READ,  WRITE,  and  so  forth.  The  mes¬ 
sage  types  and  their  associated  syntax  are  conventions; 
the  kernel  only  enforces  or  interprets  those  messages  sent 
to  kernel  devices.  Requests  and  replies  are  sent  and  re¬ 
ceived  asynchronously. 

This  design  aids  virtualization.  All  of  a  process’s  in¬ 
teractions  with  the  system — whether  with  the  kernel  or 
other  user  applications — take  the  same  form,  explicitly 
involve  capabilities,  and  shun  implicit  state.  Consider,  for 
example,  the  Unix  call  open  ( "  f  oo " )  •  This  call  in  As¬ 
nix  would  translate  to  a  message  that  a  process  P  sends 
to  the  file  server  device  FS: 

P  ->  (Ccwd,  LOOKUP,  "foo",  CP)  ->  FS. 

The  first  argument  is  a  capability  Ccwd  that  identifies  P’s 
current  working  directory.  The  second  is  the  command 
to  perform,  the  third  represents  the  arguments,  and  the 
fourth  is  the  capability  to  which  the  file  system  should 
send  its  response.  Since  Asnix  makes  explicit  the  CWD 
state  hidden  in  the  Unix  system  call,  either  the  file  server 
or  a  user  process  masquerading  as  the  file  server  can  an¬ 
swer  the  message. 
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3.2  Naming  and  Managing  Capabilities 

When  an  Asnix  process  Pi  launches  a  child  process  P i, 
it  typically  grants  P2  a  number  of  capabilities,  rang¬ 
ing  from  directories  on  the  file  system  to  opened  net¬ 
work  connections.  How  can  Pi  then  access  these  capa¬ 
bilities?  Traditional  capability  systems  such  as  EROS  fa¬ 
vor  global,  persistent  naming,  but  persistence  has  proven 
cumbersome  to  kernel  and  application  designers  [27], 

Instead,  we  advocate  a  per-process,  Unix-style 
namespace.  Under  Asnix,  Pi  makes  capabilities  avail¬ 
able  to  Pi  as  files  in  Pi's  namespace.  Suppose  Pi’s 
namespace  contains  a  tree  of  files  and  directories  under 
/secret,  and  P\  wishes  to  grant  P2  access  to  files  un¬ 
der  /secret /bob.  As  in  Plan  9  [20],  Pi  can  mount 
/secret/bob  as  the  directory  /home  in  Pi's  names¬ 
pace.  Unlike  in  Plan  9,  the  state  implicit  in  the  per- 
process  namespace  is  handled  at  user  level,  and  the  ker¬ 
nel  only  traffics  in  messages  sent  to  capabilities.  For  ex¬ 
ample,  when  the  process  Pi  opens  a  file  under  /home, 
the  user  level  libraries  translate  the  directory  /home  to 
some  capability  C.  The  kernel  sees  a  LOOKUP  message 
on  C. 

3.3  OKWS  Under  Asnix 

We  now  consider  what  OKWS  might  look  like  on  As¬ 
nix.  Similar  to  before,  the  application  suite  consists  of 
a  launcher,  demux  and  worker  processes.  Under  Asnix, 
the  logger  process  simply  enforces  append-only  access 
to  a  log  file,  and  might  be  useful  for  many  applications 
(much  like  syslogd  on  today’s  systems).  No  publisher 
process  is  needed. 

The  launcher  starts  each  worker  process  with  an 
empty  namespace  (and  thus  no  capabilities),  then  aug¬ 
ments  their  namespaces  as  follows: 

•  In  the  logger's  namespace,  mounts  a  logfile  on 
/okws/log. 

•  In  the  demux's  namespace,  mounts  TCP  port  80 
on  /okws/listen.  For  each  worker  process  i, 
makes  a  socket  pair  and  connects  one  end  to 
/okws/  worker/;. 

•  In  worker  process  i's  namespace,  mounts  the  other 
end  of  the  above  socket  pair  to  /okws/listen. 
Mounts  a  connection  to  the  logger  on  / okws/log. 
Mounts  a  read-only  capability  to  the  root  HTML  di¬ 
rectory  on  /  www. 

•  In  all  namespaces,  makes  required  shared  libraries 
available  under  /  lib. 

The  launcher  then  launches  all  processes  as  before. 

Under  Unix,  the  launcher  had  to  carefully  construct 
jails,  physically  copying  over  files  and  invoking  custom 
helper  applications  like  the  publisher  and  logger  to  limit 
file  system  access.  Asnix,  by  contrast,  lets  the  launcher 
expose  capabilities  to  child  processes  at  arbitrary  points 
in  their  namespaces.  Each  child  receives  a  synthetic  file 
system  perfectly  suited  to  its  task. 


Moreover,  all  capabilities  available  to  the  Asnix 
OKWS  processes  are  virtualizable.  Workers  accept  con¬ 
nections  on  /  okws/ listen  regardless  of  whether  they 
originate  from  the  kernel’s  TCP  stack  or  the  demux.  Sim¬ 
ilarly,  logging  might  be  to  a  raw  file  or  through  a  logging 
process  that  enforces  append-only  behavior;  worker  pro¬ 
cesses  are  oblivious  to  the  difference. 

3.4  Discussion 

So  far,  the  proposed  system  features  no  individually 
novel  ideas;  rather,  it  finds  a  new  point  in  the  OS  de¬ 
sign  space  amenable  to  secure  application  construction. 
Similar  effects  might  be  possible  with  message-passing 
microkernels,  or  unwieldy  system  call  interposition  mod¬ 
ules.  But  in  Asnix,  the  security  primitives  are  few  and 
simple,  for  both  the  kernel  and  application  developer.  Al¬ 
though  the  interface  exposed  to  applications  feels  like 
the  familiar  Unix  namespace  (with  added  flexibility  for 
unprivileged,  fine-grained  jails),  an  application’s  system 
interactions  are  entirely  defined  by  its  capabilities,  and 
Asnix  behaves  like  a  capability  system  for  the  purposes 
of  security  analysis. 

4  Fine-Grained  POLP  with  MAC 

Though  we  believe  Asnix  is  an  improvement  over  the 
status  quo,  it  still  falls  short  of  enabling  the  high-level, 
end-to-end  security  policies  we  seek.  Applications  in  As¬ 
nix  can  only  express  security  policies  in  terms  of  pro¬ 
cesses,  but  processes  often  access  many  different  types 
of  data  on  behalf  of  different  users.  A  security  policy 
based  on  processes  alone  can  therefore  conflate  data 
flows  that  ought  to  be  handled  separately.  For  exam¬ 
ple,  OKWS  on  Asenix  achieves  the  policy  that  data 
from  a  / change -pw  process  cannot  flow  to  a  corrupted 
/show- inbox  process;  but  the  policy  says  nothing 
about  whether  user  U's  data  within  /show-  inbox  can 
flow  to  user  V,  meaning  an  attacker  who  compromises 
/  show  -  inbox  might  be  able  to  read  an  arbitrary  user’s 
private  e-mail. 

Of  course,  a  much  better  policy  for  OKW S  would  be 
that  “only  user  U  can  access  user  U's  private  data”.  We 
would  like  to  separate  users  from  one  another,  much  as 
we  separated  services  in  Section  3.  Though  a  user  ses¬ 
sion  involves  many  different  processes  (such  as  the  de¬ 
mux,  databases3,  and  worker  processes),  a  policy  for  sep¬ 
arating  users  should  be  achievable  with  a  small,  simple, 
isolated  block  of  trusted  code,  as  opposed  to  hidden  au¬ 
thorization  checks  scattered  throughout  the  system.  This 
section  extends  Asnix  to  a  new  system.  Asbestos,  whose 
kernel  uses  flexible  mandatory  access  control  primitives 
to  enforce  richer  end-to-end  security  policies.  We  are 
currently  designing  and  building  Asbestos  as  a  full  op¬ 
erating  system  for  x86  machines. 

4.1  Complete  Isolation 

One  possible  approach  to  better  isolation,  which  we  call 
complete  isolation,  would  be  to  prohibit  server-side  pro- 
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cesses  from  speaking  for  multiple  users.  The  server  must 
be  prepared  to  run  a  process  for  every  service-user  pair; 
trusted  code  in  demux  would  route  traffic  accordingly. 
Similarly,  a  database  process  exists  for  each  user,  writing 
to  a  user-specific  database  file.  Capabilities  can  guaran¬ 
tee  separation  between  processes  as  usual.  More  drastic 
separation  is  possible  with  virtual  machines  [11,  32]  so 
that  each  machine  can  only  speak  for  one  user. 

Complete  isolation  hides  a  user’s  data  from  other 
users,  but  at  significant  cost.  First,  such  systems  are  not 
scalable,  requiring  either  an  expensive  fork-accept-close 
model  or  a  huge  pool  of  largely-idle  per-user  servers. 
Second,  these  systems  do  not  accommodate  convenient 
data  sharing,  even  with  trusted  processes.  While  tradi¬ 
tional  systems  could  use  simple  SQL  statements  to  ag¬ 
gregate  statistics  over  rows  of  a  site’s  databases,  com¬ 
pletely  isolated  systems  would  have  to  search  millions 
of  separate  files,  perhaps  over  NFS  in  the  case  of  sepa¬ 
rated  virtual  machines.  Separation  in  this  case  requires  a 
tremendous  sacrifice  in  flexibility  for  data  management. 
Data  will  not  flow  where  it  shouldn’t,  because  it  cannot 
flow  at  all. 

4.2  Decentralized,  Fine-Grained  MAC 

Asbestos  uses  decentralized,  fine-grained  mandatory  ac¬ 
cess  control  (MAC)  primitives  to  solve  this  problem  in 
a  flexible  and  scalable  manner.  Subjects  on  the  system, 
such  as  processes,  I/O  channels,  and  files,  are  assigned 
labels ,  and  special  privilege  is  needed  to  remove  a  label 
once  applied.  Furthermore,  a  subject  transmits  its  labels 
to  any  other  objects  that  it  successfully  communicates 
with.  With  labels.  Asbestos  tracks  all  subjects  that  have 
accessed  a  given  object,  whether  directly  or  via  proxy. 

We  propose  two  important  modifications  to  tradi¬ 
tional  MAC -based  operating  systems.  First,  decentral¬ 
ization  [17]:  processes  can  create  their  own  labeling 
schemes  on  the  fly,  so  that  a  Web  server  can  associate 
each  remote  user  with  her  own  label.  Second,  labels  ap¬ 
ply  at  the  fine-grained  level  of  individual  memory  pages, 
so  that  a  single  process  can  act  on  behalf  of  mutually  dis¬ 
trustful  users  without  fear  of  leaking  data  among  them. 
Taken  together,  these  two  modifications  allow  applica¬ 
tion  designers  to  dynamically  partition  server  processes 
into  isolated  sub-processes ,  where  a  sub-process  consists 
of  a  set  of  virtual  pages  that  share  the  same  label. 

When  a  server  process  receives  a  message,  it  is  au¬ 
tomatically  assigned  to  a  sub-process  based  on  the  label 
of  the  message’s  source.  Processing  a  message  from  user 
U  “contaminates”  the  process  with  f/’s  labels.  As  in  tra¬ 
ditional  MAC,  contamination  with  the  label  U  prevents 
a  process  from  accessing  resources  forbidden  from  user 
U ,  such  as  user  V’s  network  connection.  Thus,  the  kernel 
must  allow  a  process  speaking  on  behalf  of  multiple  users 
to  purge  its  labels  without  leaking  data.  Asbestos  lets  a 
process  flush  its  register  state,  remap  its  memory,  and 
clear  its  labels,  allowing  it  to  serve  a  request  on  behalf  of 
a  different  user  V.  However,  the  system  still  accommo¬ 


dates  trusted  declassifiers ,  such  as  statistics  collectors, 
that  can  act  on  behalf  of  multiple  users  and  traverse  sub¬ 
process  boundaries  within  a  virtual  address  space. 

With  decentralized,  fine-grained  MAC,  OKWS  can 
achieve  a  strong  end-to-end  security  policy.  The  only 
trusted  code  is  a  labeler  module  upstream  of  demux, 
which  works  as  follows.  When  user  U  connects  to  the 
Web  server,  the  labeler  peeks  at  the  incoming  TCP  con¬ 
nection  T  and  authorizes  it  based  on  session  state  or  login 
information.  If  authorization  succeeds,  the  labeler  labels 
T  with  f/’s  label.  Now,  any  process  that  reads  from  T 
and  writes  to  memory  will  automatically  tag  that  mem¬ 
ory  page  with  f/’s  label,  and  will  therefore  push  that  page 
into  f/’s  sub-process.  The  kernel  allows  an  unprivileged 
process  to  accumulate  labels  for  different  users  (such  as 
for  U  and  V),  but  it  forbids  that  process  from  writing  to  a 
network  channel  not  labeled  with  both.  Thus,  if  U  com¬ 
promises  a  server  process  and  convinces  it  to  read  from 
V’s  memory,  the  server  process  will  acquire  labels  for 
both  U  and  V,  and  therefore  cannot  write  out  to  T. 

4.3  Discussion 

This  decentralized  MAC  design,  combined  with  the  ca¬ 
pability  architecture  from  Section  3,  makes  POLP  con¬ 
venient  and  practical  for  an  OKWS-like  Web  server. 
We  have  no  proof  that  other  applications  would  simi¬ 
larly  benefit  from  Asbestos,  but  we  are  optimistic.  As¬ 
bestos  provides  simple,  flexible,  and  fine-grained  mech¬ 
anisms  for  achieving  the  five  important  POLP  require¬ 
ments  without  sacrificing  performance. 

5  Related  Work 

Asbestos  proposes  the  marriage  of  previous  ideas  in 
systems:  the  capability-based  operating  system  [4,  13, 
28,  33],  the  per-process  name  space  [20],  the  virtualiz- 
able  kernel  interface  (the  logical  extension  of  system- 
call  interposition  libraries  [7,  22]),  and  decentralized 
MAC  [17]. 

Naturally,  other  operating  systems  predating  As¬ 
bestos  meet  related  design  goals  or  offer  similar  features. 
Message-based  operating  systems  such  as  L4,  Amoeba, 
V,  Chorus  and  Spring  can  isolate  system  services  by  run¬ 
ning  them  as  independent,  user-level  processes  and  pro¬ 
vide  natural  support  for  interposition  through  message- 
based  interfaces  [14];  Trusted  Mach  in  particular  views 
message -passing  from  a  security  perspective  [6].  But 
ports  in  microkernel  systems  are  coarse  as  capabilities 
go;  for  instance,  a  process  can  have  a  capability  for  the 
file  server  but  not  for  a  particular  directory.  For  POLP, 
application  programmers  need  arbitrary  collections  of 
specific  capabilities;  in  this  respect,  the  microkernels  of 
yesteryear  do  not  fit  the  bill. 

The  Flask  System  applies  MAC  to  the  Fluke  Micro¬ 
kernel  [29].  Many  of  Flask’s  core  design  principles  have 
found  a  modern  incarnation  in  SELinux  [15],  which, 
like  TrustedBSD  [31],  adds  mandatory  access  control  to 
popular  Unix  systems.  In  both,  static  policy  files  dic- 
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tate  which  resources  applications  might  access,  and  how 
processes  can  interact  with  one  another.  Such  systems 
are  attractive  because  they  preserve  the  POSIX  interface 
to  which  many  programmers  are  accustomed.  However, 
their  policy  extension  model,  which  is  based  on  privi¬ 
leged  files  and  kernel  modules,  appears  to  fall  short  of 
the  decentralized  and  uniformly-analyzable  policies  im¬ 
plemented  by  Asbestos  labels. 

Type  safety  is  another  way  to  enforce  operating 
system  security.  Coyotos  combines  capabilities  with 
language-level  verification  techniques  [27].  Singularity 
combines  strong  isolation  with  a  type-safe  ABI  [8],  At 
user  level,  the  Java  Sandbox  uses  customizable  policies 
to  specify  an  applet’s  access  rights;  dynamic  sandboxing 
shows  these  policies  can  be  automatically  produced  [9]. 
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Notes 

1  Were  it  not  for  this  prohibition,  unprivileged  users  could  use  con¬ 
trol  of  the  chrooted  top-level  directory  to  elevate  privileges.  The  at¬ 
tack  is  to  make  a  new  directory  /tmp/  f  oo,  hard  link  from  /tmp/ 
foo/su  to  the  system  su,  write  a  new  password  file  /tmp/foo/ 
etc/passwd,  call  chroot  on  /tmp/foo,  and  then  call  su  from 
within  the  jail. 

2 Polaris  appears  not  as  well-suited  for  larger  servers. 

3  We  assume  for  simplicity  that  databases  run  locally,  though  all 
concepts  discussed  can  generalize  to  distributed  deployments. 
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Abstract 

Widely-used  operating  systems  do  not  provide  adequate 
mechanisms  for  security-minded  programmers:  developing 
secure  applications  is  complicated  and  error  prone.  Asbestos, 
a  new  operating  system,  provides  novel  labeling  mechanisms 
that  makes  it  easy  for  programmers  to  develop  secure  soft¬ 
ware.  Asbestos  labels  let  application  developers  create  a  wide 
range  of  easily  understood  access  policies,  both  mandatory 
and  discretionary;  new  label  save/restore  support  makes  it 
possible  to  implement  efficient  services  with  these  labels.  A 
Web  server  that  uses  Asbestos  labels  to  implement  mandatory 
controls  on  client  data  requires  only  one  virtual  memory  page 
per  user,  demonstrating  that  the  additional  security  comes  at 
an  acceptable  cost. 

1  Introduction 

Today’s  computer  systems  have  an  undeniably  bad  track 
record  in  security.  We  routinely  hear  of  Web  servers  [19]  and 
other  systems  [28]  experiencing  catastrophic  breaches  that  di¬ 
vulge  tens  or  hundreds  of  thousands  of  people’s  private  in¬ 
formation.  End  users  suffer  from  viruses  and  spyware  that, 
through  various  applications,  infiltrate  their  operating  sys¬ 
tems,  leak  clickstream  data,  send  spam,  participate  in  denial 
of  service  attacks,  and  perform  other  malicious  actions. 

Most  of  these  problems  can  be  attributed  to  two  fac¬ 
tors:  exploitable  flaws  in  software,  and  users’  willingness  to 
run  malicious  code  disguised  as  legitimate  software  or  docu¬ 
ments.  Unfortunately,  neither  factor  appears  likely  to  improve 
significantly  in  the  near  future.  Thus,  the  most  viable  means 
of  improving  security  in  practice  may  be  designing  systems 
that  accommodate  these  threats.  For  example,  an  email  reader 
should  be  able  to  confine  an  executable  attachment  by  only 
giving  it  access  to  a  display  window  and  perhaps  a  tempo¬ 
rary  file  system.  A  Web  site  should  be  able  to  ensure  that  one 
user’s  private  data  cannot  be  sent  to  another  user’s  browser  by 
a  buggy  Web  server. 

Confining  processes  and  limiting  information  flow  re¬ 
quire  the  ability  to  enforce  nondiscretionary  security  policies. 
To  date,  most  operating  system  support  for  nondiscretionary 
policies  has  taken  the  form  of  multi-level  secure  (MLS)  sys¬ 
tems  suitable  for  military-style  classification  policies  [6]. 
MLS  systems  primarily  allow  security  administrators  to  im¬ 
pose  external  constraints  on  existing  software.  The  nondis¬ 


cretionary  access  control  mechanisms  in  these  systems  typi¬ 
cally  cannot  be  used  by  ordinary  users  to  craft  their  own  poli¬ 
cies,  cannot  help  developers  restructure  applications  to  toler¬ 
ate  breaches,  and  cannot  be  applied  at  a  fine  granularity  to 
protect  large  numbers  of  users’  data,  as  would  be  required  for 
a  typical  Web  site.  Language-based  information  flow  systems 
can  support  somewhat  more  decentralized  policies  [26],  but 
there  are  significant  advantages  to  implementing  flow  con¬ 
trol  at  the  OS  level — not  least  the  smaller  trusted  computing 
base  and  hardware  support  for  protection.  Furthermore,  even 
these  systems  may  not  support  the  dynamic,  decentralized 
creation  of  principals.  On  the  other  hand,  capability-based  op¬ 
erating  systems  offer  some  attractive  features,  including  dy¬ 
namic  principal  creation  and  fine-grained  access  control  but 
give  up  explicit  control  of  information  flow. 

Motivated  by  the  difficulty  of  writing  secure  code  for  to¬ 
day’s  operating  systems,  and  believing  that  a  clean-slate  de¬ 
sign  would  more  likely  lead  to  advances  that  could  eventu¬ 
ally  be  mapped  back  to  more  conventional  OSes,  we  describe 
here  a  new  operating  system.  Asbestos,  that  combines  the 
advantages  of  capability-based  and  nondiscretionary-access 
systems.  Asbestos  access  control  is  based  on  a  single  simple 
primitive.  Asbestos  labels ,  that  can  implement  both  discre¬ 
tionary  and  nondiscretionary  access  policies,  in  a  completely 
decentralized  fashion.  Any  process  may  create  an  access  con¬ 
trol  space,  represented  by  a  handle,  and  control  the  policies 
applied  relative  to  that  space.  Labels  straightforwardly  imple¬ 
ment  traditional  capabilities,  mandatory  access  control,  and 
hybrid  schemes. 

But  process-granularity  labels  themselves  are  not  suffi¬ 
cient  to  build  fast,  secure  applications.  A  Web  service,  for  ex¬ 
ample,  may  speak  concurrently  to  multiple  users.  The  service 
author  might  like  to  enforce  a  policy  in  which  no  user  could 
see  or  touch  another  user’s  data.  Process  labels  would  force 
such  a  service  to  be  implemented  with  one  process  per  user, 
at  significant  resource  cost.  Asbestos  therefore  supports  finer- 
granularity  access  control,  called  label  save/restore,  that  lets 
a  service  apply  labels  selectively  to  specific  memory  pages. 
The  operating  system  ensures,  with  label  checks  and  virtual 
memory  operations,  that  pages  for  a  given  user  are  visible 
only  during  the  processing  of  requests  for  that  user.  Even  if 
one  user’s  instance  of  the  process  is  broken  into,  all  other 
user  data  is  safe.  This  primitive  not  only  helps  make  appli¬ 
cations  more  secure,  it  also  might  facilitate  new  types  of  ser- 
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vices:  users  could  safely  be  granted  more  control  over  the 
code  running  on  their  behalf,  since  labels  ensure  that  other 
users’  data  will  never  be  compromised.  Our  measurements 
indicate  that  label  save/restore  lets  an  operating  system  run 
a  Web  server  with  the  same,  or  more  comprehensive,  access 
control  checks  as  a  one-service-per-user  model,  but  while  us¬ 
ing  just  one  page  of  memory  per  user.  Our  implementation 
runs  on  real  x86  hardware.  Despite  a  completely  untuned  im¬ 
plementation,  performance  implications  of  label  save/restore 
are  modest. 

The  contributions  of  this  paper  include  Asbestos  labels, 
label  save/restore,  and  our  example  application,  a  dynamic 
Web  server  using  labels  and  label  save/restore  to  provide 
memory  efficient,  safe  support  for  multiple-user  services. 

1.1  A  motivating  application:  a  better  web  server 

Dynamic  Web  servers  often  serve  as  gateways  to  databases 
containing  private  information.  In  this  role,  a  Web  server  is 
expected  to  provide  users  with  the  data  they  require  without 
exposing  data  that  should  remain  private.  Unfortunately,  as 
Web  sites  grow  in  size  and  complexity,  Web  developers  are 
more  likely  to  forget  or  misapply  access  controls.  Worse  still, 
access-control  techniques  applied  at  the  application  level  are 
vulnerable  to  attacks  on  an  often  bloated  trusted  computing 
base,  including  the  operating  system  (e.g.,  Linux),  the  system 
library  (e.g.,  libc),  the  Web  server  (e.g.,  Apache),  any  Web 
server  modules  (e.g.,  PHP  or  SSL),  and  the  Web  application 
itself  (e.g.,  change -pw.php).  A  vulnerability  at  any  level 
may  allow  an  attacker  to  gain  access  to  private  data. 

We  believe  three  principles  are  necessary  when  building 
secure  applications  such  as  Web  servers.  First,  whichever  se¬ 
curity  primitives  are  invoked,  they  should  be  implemented 
with  the  smallest  possible  trusted  computing  base.  Second, 
they  should  be  mandatory  as  opposed  to  discretionary.  Third, 
failures  should  be  isolated;  a  bug  in  one  function  should  not 
compromise  all  data. 

Software  such  as  the  OK  Web  server  (OKWS)  has  shown 
it  is  possible  to  contort  the  existing  Unix  interface  to  achieve 
some  security  goals,  such  as  isolation,  while  ignoring  the 
others  [17].  In  OKWS,  each  logical  Web  service  (such  as 
change-pw  or  check- inbox)  runs  as  a  separate  process 
with  its  own  address  space.  If  an  attacker  compromises  and 
controls  check- inbox,  he  cannot  collect  or  change  arbi¬ 
trary  user  passwords  (e.g.,  change-pw). 

With  the  benefit  of  Asbestos  labeling,  OKWS  could  do 
much  better.  On  current  operating  systems,  if  a  remote  user 
A  compromises  an  OKWS  service  (e.g.  check- inbox),  he 
would  have  all  privileges  of  that  service  (e.g.  read  other  users’ 
email).  Under  Asbestos,  even  if  a  remote  user  A  can  compro¬ 
mise  a  service,  kernel  protections  prevent  him  from  accessing 
user  B's  information.  An  OKWS-like  Web  server  in  Asbestos 
can  achieve  all  three  security  principles. 


2  Related  Work 

Labels  have  long  been  used  to  enforce  mandatory  access  con¬ 
trol  and  are  required  by  higher  divisions  of  the  DoD  Trusted 
Computer  System  Evaluation  Criteria  [6].  Security  enhance¬ 
ment  packages  with  labels  are  available  today  for  popular 
operating  systems  such  as  Linux  [21]  and  FreeBSD  [36]. 
The  idea  of  dynamically  adjusting  labels  to  track  potential 
information  flow  dates  back  to  the  High- Water-Mark  secu¬ 
rity  model  [18]  of  the  ADEPT-50  in  the  late  1960s.  Numer¬ 
ous  systems  have  incorporated  such  mechanisms,  including 
IX  [23]  and  LOMAC  [8]. 

Asbestos  labels  differ  significantly  from  those  of  previ¬ 
ous  operating  systems  in  several  ways.  In  Asbestos,  any  pro¬ 
cess  can  dynamically  create  a  label  category,  a  handle,  and 
control  the  propagation  of  information  labeled  with  that  cate¬ 
gory.  Ordinary  processes  can  both  declassify  information  and 
raise  the  security  clearance  of  other  processes,  but  only  in 
the  particular  categories  they  control.  By  contrast,  traditional 
MAC  systems  have  a  fixed  number  of  compartments  and  se¬ 
curity  levels,  all  under  the  control  of  the  security  administra¬ 
tor.  The  ORAC  model  [22]  does  support  the  idea  of  individual 
originators  placing  accumulating  restrictions  on  data,  but  data 
can  still  only  be  sanitized  by  users  with  the  privileged  Down- 
grader  role.  Asbestos  also  differs  from  previous  systems  in 
that  its  virtual  memory  system  allows  labels  to  control  the 
flow  of  information  within  a  single  process. 

Asbestos  labels  more  closely  resemble  language-level 
flow  control  mechanisms.  Jif  [27],  in  particular,  was  an  in¬ 
spiration  for  Asbestos  because  of  its  support  for  decentral¬ 
ized  declassification  through  separate  ownership  of  different 
label  components.  Because  it  is  a  programming  language,  Jif 
has  the  advantage  of  being  able  to  perform  most  of  its  label 
checks  statically,  at  compile  time.  Run-time  checks  can  affect 
control  flow  on  failure,  thereby  creating  implicit  information 
flows  [5].  However,  compared  to  Asbestos,  Jif  requires  a  cen¬ 
tralized  principal  hierarchy  and  has  no  equivalent  to  the  asym¬ 
metric  label  defaults  Asbestos  uses  to  support  policies  such  as 
preventing  one  process  from  talking  to  another. 

Because  Asbestos  handles  are  also  communication  end¬ 
points,  they  can  be  thought  of  as  capabilities.  Many  be¬ 
lieve  that  capabilities  cannot  implement  mandatory  access 
control  [2].  Strictly  speaking,  this  is  untrue.  For  instance, 
KeyKOS  [15]  achieved  military-grade  security  by  isolating 
processes  into  compartments  and  ensuring  any  capability 
pointing  outside  a  compartment  designated  a  special  multi¬ 
level  secure  object.  EROS  [32]  later  successfully  realized  the 
principles  behind  KeyKOS  on  modern  hardware. 

Psychologically,  however,  people  have  not  accepted  pure 
capability-based  confinement  [24],  perhaps  from  the  fear  that 
if  just  one  inappropriate  capability  escaped,  the  whole  sys¬ 
tem  might  collapse.  As  a  result,  a  number  of  designs  have 
combined  capabilities  with  authority  checks  [1],  interposi¬ 
tion  [12],  or  even  labels  [13].  Asbestos  demonstrates  that  with 
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decentralized  labels,  capabilities  are  unnecessary — the  labels 
themselves  can  be  used  to  implement  a  capability  system. 

Asbestos  handles,  as  communication  end  points,  are  also 
reminiscent  of  message-based  operating  systems  [4,  20,  25, 
30,  31,  33],  some  of  which  can  confine  executable  con¬ 
tent  [11],  others  of  which  have  had  full-fledged  mandatory 
access  control  implementations  [3], 

Mandatory  access  control  can  also  be  achieved  with  un¬ 
modified  traditional  operating  systems  through  virtual  ma¬ 
chines  [9,  14].  For  example,  the  NetTop  project  [34]  uses 
VMware  for  multi-level  security.  Virtual  machines  have  two 
principal  limitations,  however:  performance  [16,  38]  and 
coarse  granularity.  One  of  the  goals  of  Asbestos  is  to  allow 
very  fine-grained  information  flow  control,  so  that  a  single 
process  can  handle  differently  labeled  data.  To  implement  a 
similar  structure  with  virtual  machines  would  require  a  sepa¬ 
rate  instance  of  the  operating  system  for  each  label. 

3  Operating  System  Primitives 

Asbestos  is  a  message-passing  operating  system.  User-level 
processes  communicate  with  one  another  via  messages, 
which  are  sent  to  communication  ports  called  handles.  This 
section  describes  characteristics  of  these  primitives  important 
for  understanding  Asbestos’s  more  unusual  features. 

3.1  Handles 

Asbestos  handles  combine  aspects  of  communication  end¬ 
points,  capabilities,  and  kernel-protected  information  labels. 
An  Asbestos  handle  is  simply  a  61 -bit  number,  and  appli¬ 
cations  can  treat  arbitrary  numbers  as  handles.  Two  kernel 
structures  monitor  and  control  handle  usage.  A  single  routing 
table  stores  the  device  entity  responsible  for  messages  sent 
to  each  handle;  and  per-process  labels ,  described  further  be¬ 
low,  control  information  flow  relative  to  individual  handles. 
Different  label  settings  can  cause  a  handle  to  act  like  a  capa¬ 
bility,  a  multi-level  security  (MLS)  level,  or  a  combination  of 
the  two. 

In  this  code  fragment,  an  application  creates  a  new  handle. 
(Some  arguments  have  been  left  out  for  conciseness.) 

handle_t  h; 

r  =  sys_new_handle (&h,  .  .  .  )  ; 

The  new_handle  system  call  generates  a  new  handle,  grants 
the  process  labels  access  to  that  handle  (see  below),  stores 
the  handle  in  the  routing  table,  and  optionally  sets  up  an  in¬ 
kernel  message  queue  to  receive  messages  sent  to  that  han¬ 
dle.  Each  call  to  the  newjiandle  system  call  returns  a  handle 
not  seen  before,  until  the  space  wraps  around;  at  a  rate  of 
1  billion  handle  creations  per  second,  this  would  take  about 
73  years,  newjiandle  results  are  ephemeral  rather  than  per¬ 
sistent;  a  different  handle  sequence  might  arise  from  every 
boot.  They  are  also  unpredictable.  There  is  no  way  to  create 
a  message  queue  for  a  known  handle  value,  and  the  system 
encrypts  new  handle  values  with  a  block  cipher.  This  closes 


certain  covert  channels — applications  cannot  tell  how  many 
other  handles  have  been  created — and  will  facilitate  virtual¬ 
ization,  since  there  are  no  known  global  handle  values  in  the 
system. 

Processes  may  try  to  send  messages  to  any  handle.  The 
kernel  checks  the  routing  table  to  see  if  the  handle  has  a  con¬ 
trolling  device  (either  a  process  message  queue  or  some  in¬ 
kernel  entity).  If  so,  the  message  is  delivered  there,  subject  to 
access  control  checks;  otherwise,  it  is  treated  as  a  label  check 
failure.  Access  control  checks  are  implemented  using  labels, 
as  described  in  section  4. 

Other  handle  system  calls  include  handle  .transfer,  by 
which  one  process  can  transfer  ownership  of  a  handle  to  an¬ 
other,  and  set  JiandleJabel,  which  is  described  below.  Each 
active  handle  corresponds  to  a  64-byte  kernel-private  data 
structure,  called  the  vnode.  The  routing  table  simply  maps 
handles  to  vnodes.  Vnodes  are  reference  counted;  when  all 
kernel  references  to  a  vnode  disappear,  the  kernel  may  reuse 
its  memory.  However,  the  kernel  will  not  reuse  the  handle  as¬ 
sociated  with  the  vnode,  since  that  handle  might  still  be  in  use 
within  a  process. 

3.2  Messages 

Asbestos  defines  a  small  number  of  message  types  and  con¬ 
ventions  guiding  their  use.  For  example,  the  LOOKUP  mes¬ 
sage  type  is  used  to  look  up  entries  in  a  directory  or  directory¬ 
like  object.  We  intend  all  applications  and  OS  services  to  use 
this  small  set  of  message  types  in  a  uniform  manner,  which 
simplifies  virtualization  of  system  functionality. 

Messages  have  six  components:  a  destination  handle,  a 
type,  a  message  code,  an  ID,  an  optional  reply  handle,  and 
an  optional  payload.  The  type  defines  the  class  of  operation 
being  requested.  The  device  that  receives  the  message  will 
send  replies  to  the  reply  handle.  The  message  code  can  supply 
an  argument  for  request  types;  for  reply  types,  it  reports  any 
error  result.  Finally,  the  message  ID  helps  match  replies  to  the 
corresponding  requests. 

Message  types  include: 

LOOKUP  Looks  up  entries  in  directories  or  directory-like  ob¬ 
jects.  The  payload  is  the  name  of  the  entry  to  look  up. 
Replies  use  type  LOOKUP_R  (a  general  convention); 
their  payload  generally  contains  one  or  more  handle  val¬ 
ues  for  the  entries. 

READ,  WRITE  Requests  data  from  an  object,  or  writes  data 
to  an  object. 

CONTROL  A  catchall  message  for  non-read/write  access. 
The  message  code  specifies  behavior  further  (e.g.  the  file 
system  responds  to  STAT  control  messages). 

Messages  are  stored  for  delivery  on  in-kernel  message 
queues.  A  message  queue  is  associated  with  one  or  more  han¬ 
dles,  all  of  which  must  be  controlled  by  the  same  process;  all 
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messages  sent  to  those  handles  are  delivered  to  the  message 
queue.  Message  queues  are  implemented  as  circular  buffers 
of  pages.  Small  message  data  is  copied  into  the  page  contain¬ 
ing  the  message  header;  large  data  uses  copy-on-write  page 
mappings.  When  delivering  messages  from  the  queue  to  the 
process,  we  map  pages  whenever  possible.  Message  access 
control  checks,  which  use  labels,  are  performed  at  send  time, 
but  any  changes  to  the  receiving  process’s  labels  are  delayed 
until  the  message  is  actually  delivered. 

3.3  System  calls 

Most  Asbestos  functionality,  aside  from  sending  and  receiv¬ 
ing  messages  and  managing  handles,  will  eventually  be  ac¬ 
cessible  through  messages.  Currently,  though,  we  do  support 
other  system  calls.  Most  importantly.  Asbestos  processes, 
like  exokernel  processes  [7],  can  manage  their  virtual  ad¬ 
dress  space,  and  that  of  other  processes  they  control,  by  map¬ 
ping  and  unmapping  pages  to  and  from  virtual  addresses. 
The  relevant  system  calls  are  page jdloc,  pagejnap  and 
pagejinmap.  However,  Asbestos  will  not  support  writable 
shared  memory:  if  two  processes  share  a  page,  it  is  mapped 
either  read-only  or  copy-on-write. 

4  Asbestos  Labels 

All  Asbestos  access  control  and  information  flow  checks  are 
implemented  with  a  single  primitive,  labels.  Labels  are  flex¬ 
ible  enough  to  implement  a  wide  range  of  discretionary  and 
mandatory  access  policies,  and  the  Asbestos  label  design  is 
one  of  our  main  contributions.  This  section  incrementally  de¬ 
velops  and  explains  that  design.  Labels  have  been  used  pre¬ 
viously  in  operating  systems  and  secure  languages.  Asbestos 
labels  differ  in  their  support  for  effective  labels,  temporary 
restrictions  that  can  help  implement  discretionary  policies, 
and  their  integrated  support  for  decentralized  declassification. 
The  impatient  reader  may  wish  to  examine  Figures  1  and  2  to 
see  the  full  design. 

4.1  Process  labels 

Each  process  P  has  two  labels,  a  send  label  P s  and  a  receive 
label  PR.  A  process’s  current  access  restrictions  are  stored 
in  its  send  label,  while  the  receive  label  holds  the  maximum 
restrictions  it  is  willing  and  allowed  to  accept  from  others. 
The  core  message  access  check  is 

Ps  <  Qr,  (1) 

meaning  P  cannot  send  a  message  to  Q  unless  P’s  send  label 
is  less  than  or  equal  to  (Ps  receive  label.  If  this  check  succeeds 
and  a  message  is  delivered,  information  flows  from  P  to  Q,  so 
we  contaminate  Q  with  P’s  restrictions: 

Qs  <-  max(Qs.-Ps)- 

These  two  operations  are  the  core  of  any  information  flow 
system,  and  of  many  previous  OS  label  designs  [8,  18,  23], 


In  Asbestos,  a  label  is  a  function  from  handles  to  levels , 
which  are  members  of  the  ordered  set  {*,  0, 1, 2, 3}  (where 
*  <  0  <  •  •  •  <  3).  We  write  labels  using  set  notation,  such 
as  {hi  0,  ho  1, 2}.  The  default  level,  which  appears  without  a 
handle  at  the  end  of  the  list,  applies  to  all  handles  not  men¬ 
tioned  explicitly;  it  is  omitted  when  obvious  from  context. 

To  compare  two  labels,  we  compare  each  of  their  compo¬ 
nents: 

El  <  Lo  iff  L\(h)  <Lo{h)  for  all  h. 

The  least-upper-bound  and  greatest-lower-bound  operators, 
max  and  min,  are  defined  similarly;  see  Figure  1 . 

For  example,  consider  processes  P  and  Q  with 

Ps=Pr  =  {/*0,1}, 

es  =  GR  =  {/*3,l}. 

P  can  send  to  Q ,  since  0  <  3;  but  Q  cannot  send  to  P.  Low 
levels  like  *  are  more  permissive  than  high  values  when  they 
appear  in  send  labels,  but  more  restrictive  when  they  appear 
in  receive  labels.  In  general,  making  the  system  more  per¬ 
missive  should  require  special  privilege,  described  further  in 
Section  4.4. 

4.2  Asymmetry 

Asbestos  send  and  receive  labels  default  to  different  values. 
The  default  send  label  is  {1},  while  the  default  receive  label 
is  {2}.  Furthermore,  these  defaults  lie  in  the  middle  of  the 
label  ordering:  0  and  *  are  less  than  either,  and  3  is  greater. 

These  asymmetric,  intermediate  defaults  support  more 
flexible  isolation  than  more  conventional  designs.  For  exam¬ 
ple,  imagine  we  want  to  prevent  a  process  P  from  sending 
messages  to  a  process  Q,  using  some  handle  h.  Asbestos  sup¬ 
ports  this  in  three  distinct  ways: 

ABC 
Ps  {/i  3,1}  {1}  {A  2,1} 

Qr  {2}  {h  0, 2}  {h  1,2} 

In  column  A,  we  set  Ps(h)  to  3.  Although  this  prevents  com¬ 
munication  with  Q  as  intended,  P  cannot  send  to  any  other 
process  either,  except  for  those  with  receive  label  {/;  3}.  Such 
a  label  change  requires  special  privilege,  since  it  makes  the 
system  more  permissive.  In  column  B,  we  instead  set  Qr(Ii) 
to  0,  restricting  the  processes  that  can  send  to  Q  rather  than 
those  to  which  P  can  send.  This  time,  normal  processes  can¬ 
not  send  to  Q. 

These  isolation  mechanisms,  which  are  those  available  in 
most  mandatory  access  control  systems,  limit  P’s  ability  to 
communicate  with  Q  by  limiting  either  P  or  (Ps  ability  to 
communicate  with  anyone.  However,  Asbestos  label  asym¬ 
metry  lets  us  instead  express  a  policy  involving  both  P  and  Q, 
allowing  far  more  flexible  communication  while  still  guaran¬ 
teeing  restricted  information  flow.  In  column  C,  we  set  Ps(h) 
to  2  and  QR{h)  to  1.  P  clearly  cannot  send  a  message  to  Q.  It 
can  communicate  with  most  other  processes,  but  as  it  does  so. 
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P,  Q  Processes  *,  0, 1, 2, 3 

h,  dest  Handles  L,  C,  D,  V,  E 


Ps 

Process  P’s  send  label 

E\  <L2 

Pr 

Process  P’s  receive  label 

max(Li,  La) 

hR 

Handle  h’s  receive  label 

min(L| ,  Li) 

Ctlp 

Process  P’s  control  handle 

owned(L) 

Label  levels,  in  increasing  order 
Labels  (functions  from  handles  to  levels) 

Label  comparison:  true  if  V/t,  L\  ( h )  <  L2(h) 
Least-upper-bound  label:  {hk  \  k  =  max(Li(h) , Lo(h))} 
Greatest-lower-bound  label:  {hk  \  k  =  min(L|  (h),  L2(h))} 
Owned-handles  label:  {h  *  |  L(h)  =  *}  U  {/?  3  |  L(h)  *} 


Figure  1:  Notation. 


it  will  contaminate  those  processes  with  the  send  label  {/;  2}. 
Thus,  a  process  X  cannot  send  a  message  to  Q  if  it  has  ever 
communicated  with  P,  directly  or  indirectly.  A  process  that 
does  not  want  to  risk  contamination  can  simply  set  its  receive 
label  to  {1};  since  this  is  more  restrictive  than  the  default,  it 
requires  no  special  privilege. 

4.3  Effective  labels 

Asbestos  gives  applications  some  control  over  the  access  con¬ 
trol  check  used  when  sending  a  message.  In  particular,  appli¬ 
cations  can  further  restrict  access  relative  to  the  basic  send 
and  receive  labels.  Since  the  sender  chooses  which  labels  to 
apply,  these  effective  labels  can  implement  discretionary  poli¬ 
cies. 

Senders  provide  a  contamination  label  Cs  and  a  verifica¬ 
tion  label  V  along  with  every  message.  These  labels  are  used 
to  construct  effective  send  and  receive  labels  Ifi  and  Er,  as 
follows: 

Es  =  ma x(Ps,  Cs),  ER  =  min(gR,  V). 

The  kernel  then  allows  delivery  only  if  Es  <  Er.  This  implies 
that  P s  <  Or,  so  Equation  (1)  would  also  allow  delivery.  Es, 
rather  than  Ps ,  is  used  to  contaminate  the  receiver’s  send  la¬ 
bel.  Finally,  the  kernel  reports  the  contents  of  V  to  the  receiver 
when  delivering  the  message. 

For  example,  imagine  a  trusted  multi-user  file  server.  Two 
handles  u\  and  11  c  might  be  allocated  for  each  user  ID  u.  A 
process  that  speaks  for  user  u  would  have  send  label  {tqO} 
and  receive  label  {uc  3}.  When  sending  data  intended  exclu¬ 
sively  for  user  u,  the  file  server  sets  Cs  to  {uq  3}.  Only  pro¬ 
cesses  with  receive  label  {uq  3}  can  receive  the  resulting  mes¬ 
sage.  After  receiving,  their  send  labels  rise  to  {uc  3},  and  they 
lose  the  ability  to  talk  to  non-u  processes  (which  have  receive 
label  {i<c  2}).  When  receiving  data  from  user  u,  the  file  server 
conversely  checks  the  V  label,  approving  the  message  only  if 
V{u\)  <  0.  Processes  speaking  for  u  will  set  V  =  {iqO}  on 
each  message. 

Processes  can  also  control  the  messages  they  are  allowed 
to  receive  by  manipulating  their  handle  labels.  Each  handle 
has  its  own  label,  which  can  be  arbitrarily  set  by  the  handle’s 
controlling  process.  The  handle  label  additionally  restricts  the 
process-wide  receive  label  for  messages  sent  to  that  handle. 
When  P  sends  a  message  to  Q  over  handle  dest,  the  effective 
receive  label  is  actually 

Er  =  min(gR,  destR,  V). 


send  (dest,  Cs,  Ds,  V.  Dr.  data) 

Let  g  be  dest’ s  controlling  process 
Let  Es  =  max(Ps,Cs) 

Let  gnewR  =  max(gR,DR) 

Let  Er  =  min(g„ewR,  destR,  V) 

Let  gown  =  owned  (gs) 

Requirements: 

(1) Es<ER 

(2)  Dr  <  destR 

(3)  If  Ds(h)  <  3,  then  Ps(h)  =  * 

(4)  If  Dr(/i)  >  *,  then  Ps{h)  =  * 

Effects: 

Grant  Ds ,  contaminate  with  Es , 
then  restore  owned  handles 

gs  <—  max (min(gs,Ds),Ds) 

gs  <—  min(gs,  g  own) 
gR  <—  gnewR 

new_handle(L) 

Let  h  be  an  unused  handle 
Effects: 
hR  <—  L 
hR(h)  *—  0 
Ps(h)*~* 

Return  h 

set_handle  Jabel(dewf,  L ) 

Requirement: 

dest  was  created  by  P 
Effect: 
hR  <—  L 

FIGURE  2:  Some  Asbestos  label  operations.  P  is  the  calling  process. 

By  controlling  the  distribution  of  these  differently-privileged 
handles,  processes  can  implement  a  wide  range  of  policies. 

4.4  Ownership  and  decontamination 

Finally,  the  special  *  level  lets  processes  distribute  handle  ac¬ 
cess  and  declassify  information  in  a  decentralized  way.  A  pro¬ 
cess  with  P s  (h)  =  *  is  said  to  own  handle  h  and  has  two  priv¬ 
ileges  with  respect  to  it.  First,  P  cannot  be  contaminated  by 
other  processes  with  respect  to  /;;  P’s  send  label  will  stay  at 
{h  *}  even  if  it  receives  a  message  with  effective  send  label 
{/z3}.  Second,  P  can  decontaminate  other  processes — lower 
their  send  labels  or  raise  their  receive  labels — with  respect  to 
h.  Senders  provide  two  decontamination  labels  D s  and  Dr 
with  every  message,  in  addition  to  the  contamination  and  ver- 
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ification  labels  C s  and  V.  The  Z>s  label  attempts  to  make  the 
receiver’s  send  label  more  permissive  by  lowering  some  of 
its  levels;  conversely,  the  Z)R  label  attempts  to  make  the  re¬ 
ceiver’s  receive  label  more  permissive  by  raising  some  of  its 
levels.  As  we  mentioned,  both  of  these  operations  are  priv¬ 
ileged,  but  that  privilege  is  simply  modeled  by  Ps(h)  =  *. 
The  kernel  ensures  that  whenever  D$(h)  <  3  or  Du(h)  >  ★, 
the  process  has  Ps(h)  =  *.  Then,  assuming  the  message  is 
delivered,  the  kernel  sets 

Qs  <—  max(min(Qs.T)s),£s)  and 

6r  <-  max(QR,/)R), 

to  actually  accomplish  the  decontamination. 

Decontamination  of  receive  labels  is  particularly  sensi¬ 
tive,  since  it  can  allow  concurrent  or  later  contamination.  A 
process  may  not  want  to  allow  itself  to  become  arbitrarily 
contaminated.  Therefore,  the  kernel  checks  the  Z)R  label  in 
an  additional  way:  it  rejects  the  message  unless  Z)R  <  destR . 
Since  processes  have  full  control  over  their  handle  labels,  they 
have  full  control  over  how  much  contamination  they  are  will¬ 
ing  to  accept. 

A  process  can  decontaminate  other  processes  with  respect 
to  any  handles  it  creates,  since  the  new_handle  system  call 
sets  Ps  (h)  =  *  for  each  created  handle  h.  This  allows  the  pro¬ 
cess  to  freely  distribute  its  handles  to  others.  The  new_handle 
call  also  sets  /zR(/z)  =  0.  Since  this  is  less  than  the  default 
level  1,  only  processes  that  have  been  explicitly  declassified 
with  respect  to  h  can  send  messages  to  h.  The  kernel  never 
reuses  handles,  so  the  result  of  a  new_handle  call  has  never 
been  declassified  before;  a  process  can  be  sure  that  it  can 
control  the  set  of  processes  that  can  send  to  h  by  control¬ 
ling  h’s  distribution.  If  the  process  wants  to  accept  messages 
from  anyone,  it  can  simply  reset  the  handle  label  using  a 
set_handleJabel  system  call.  Finally,  a  setlabel  system  call 
lets  a  process  give  up  ownership  of  a  handle;  setlabel  can 
contaminate  even  handles  at  level 

Ownership  solves,  in  a  decentralized  way,  problems  that 
require  global  trust  in  most  label  systems:  information  de- 
classification  and  label  initialization.  For  instance,  the  owner¬ 
ship  rules  let  handles  implement  capability  semantics.  When 
a  process  P  creates  a  handle  h,  the  kernel  sets  Ps(h)  *  and 
/zr(/z)  <—  0.  Since  the  kernel  never  reuses  handles,  every  other 
process  Q  must  have  Qs(h)  >  1  and  can’t  send  messages  to 
h.  To  allow  sending  to  h,  P  sets  Qs  (h)  to  0  or  *  using  a  de¬ 
contamination  label;  *  lets  Q  grant  the  capability  to  others, 
while  a  process  with  value  0  can  send  a  message  to  h  but  can¬ 
not  transfer  that  right,  either  directly  or  by  acting  as  a  proxy. 
Of  course,  if  P  doesn’t  want  h  to  act  as  a  capability,  it  can 
reset  /zR  or  contaminate  its  Ps(h)  label;  capability  semantics 
are  possible  but  not  required. 

4.5  Examples 

The  values  stored  in  the  send  and  receive  labels  of  a  process 
govern  the  process’  ability  to  communicate  with  other  enti¬ 


ties.  Using  the  unique  features  of  the  Asbestos  label  system, 
such  as  asymmetry  and  the  split  process  label  design  we  are 
able  to  implement  a  wide  range  of  security  schemes.  We  now 
present,  as  a  sample  of  the  possibilities,  two  standard  security 
schemes  that  can  be  implemented  using  Asbestos  labels. 

4.5.1  Dynamic  security  and  process  isolation 

First,  we  consider  two  slightly  different  ways  in  which  a 
process  P  can  isolate  a  process  Q,  restricting  the  processes  to 
which  Q  can  send  and  receive  messages. 

In  one  case,  which  we  call  dynamic  security,  Q  can  send 
messages  only  to  P  but  can  receive  messages  from  anyone. 
Assume  that  P  owns  Q's  control  handle  and  thus  can  change 
its  receive  label.  As  mentioned  earlier,  P  can  restrict  Q  by 
increasing  a  portion  of  its  send  label  to  a  higher  level  than  that 
in  any  other  process’s  receive  label — except,  of  course,  for  P. 
To  accomplish  these  requirements,  P  generates  a  new  handle, 
j,  which  has  at  most  a  default  value  of  2  in  all  processes’ 
receive  labels;  then  sets  Qr(J)  =  Qs(j)  =  3  and  Pr(J)  =  3 
resulting  in  the  following  label  setup: 


Labels 

p 

Q 

Others 

Send 

j* 

j  3 

./< 

Receive 

j  3 

j  3 

j  2 

Q  can  still  receive  messages  from  anyone.  Messages  sent 
from  Q  to  processes  other  than  P  will  not  go  through  since 
j’s  level  in  all  other  processes’  receive  labels  has  the  default 
value  of  2,  and  thus  send  requirement  1  (Figure  2)  does  not 
stand. 

In  another  policy,  process  isolation,  P  wants  to  completely 
isolate  Q  so  it  can  send  and  receive  messages  only  from  P. 
This  starts  out  exactly  as  in  the  dynamic  security  policy,  but 
P  goes  further  by  restricting  Q's  ability  to  receive.  It  generates 
a  new  handle,  k,  which  has  at  least  the  default  value  of  1  in  all 
processes’  send  labels  and  sets  Qr(J<)  =  Qs(k)  =  0  resulting 
in  the  following  setup: 


Labels 

p 

Q 

Others 

Send 

j*,k* 

./  3,  k  0 

jl.kl 

Receive 

jXk2 

./  3,  k  0 

j2,k2 

Q  is  now  able  to  receive  messages  only  from  processes  whose 
send  label  contains  k  at  a  level  less  than  or  equal  to  0.  Since 
the  default  level  for  the  send  label  is  1,  only  P  can  now  send 
messages  to  Q. 

4.5.2  Multi-level  security 

Most  MLS  systems  are  static,  with  predefined  secrecy 
levels.  Asbestos  labels  can  implement  a  dynamic  policy,  in¬ 
cluding  specific  security  levels  and  arbitrary  sub-levels.  This 
implementation  is  also  virtualizable;  there  may  be  several 
groups  of  processes  participating  in  different  MLS  schemes 
at  the  same  time. 

Each  process  participating  in  a  MLS  space  has  a  notion 
of  current  and  maximum  security  level.  In  this  scheme  “max¬ 
imum  level’’  is  analogous  to  a  security  clearance.  Process  P 
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cannot  send  a  message  to  process  Q  if  P’s  current  security 
level  is  above  Q's  maximum  security  level.  In  this  example, 
we  present  a  scheme  with  secret  and  top  secret  processes  (rep¬ 
resented  by  handles  s  and  r,  respectively),  as  well  as  unclassi¬ 
fied  processes. 

Parts  of  the  TCB1  (such  as  the  identity  server  and  the 
multi-level  filesystem)  possess  s  *  and  t  *.  A  process  M  is  in 
charge  of  this  MLS  space  and  is  part  of  the  TCB.  Process  P 
authenticates  itself  to  M  who  is  able  to  set  P's  labels  to  corre¬ 
spond  to  the  security  level  it  should  be  able  to  use. 

All  processes  start  out  with  current  security  level  set  to 
“unclassified,”  and  thus  security  handles  in  the  send  label  set 
to  their  default  level,  1.  Once  a  process  S' s  security  level  is 
set  to  “secret,”  perhaps  because  it  accesses  a  secret  file  from 
the  multi-level  filesystem,  S  can  no  longer  send  messages  to 
unclassified  processes.  Since  this  is  a  restriction  of  S' s  abil¬ 
ity  to  send,  some  handle  in  5s  must  increase  in  level.  This 
is  accomplished  by  including  a  contaminate  argument  C  on 
the  send  message  from  the  filesystem  to  S,  setting  Ss(s)  =  3, 
resulting  in  the  following  labels: 


Labels 

u 

S 

T 

Send 

si 

s  3 

si 

Receive 

s  2 

s3 

s3 

S  is  now  able  to  send  messages  only  to  processes  that  have 
had  their  receive  label  explicitly  relaxed  with  respect  to  s,  i.e., 
processes  with  security  level  “secret”  ( S  and  T  in  the  example 
above). 

When  T  (whose  security  level  is  “top  secret”)  receives  a 
message  from  S  (after  S  has  accessed  secret  data),  the  {s3} 
label  will  contaminate  T,  restricting  it  to  sending  messages 
to  “secret”  and  “top  secret”  processes  only  If  T  receives  “top 
secret”  information,  T%(t)  will  be  increased  as  well  leading  to 
the  following  label  state: 


Labels 

u 

S 

T 

Send 

sl,rl 

s3,rl 

s3,t3 

Receive 

s2,t2 

s3,t2 

s3,t3 

4.5.3  Discussion 

These  examples  of  security  mechanisms  are  varied  in  their 
semantics,  and  yet  the  label  scheme  we  have  presented  is  able 
to  implement  them  both.  Many  other  security  policies  may  be 
implemented  with  labeling.  Our  label  scheme  is  simple  yet 
powerful.  The  most  advantageous  part  is  that  the  correctness 
of  the  policy  implementation  relies  only  on  simple  label  prop¬ 
erties. 

4.6  Implementation 

In  user  space,  a  label  is  represented  as  an  array  of  handle  val¬ 
ues  plus  a  default  level.  A  64-bit  number  can  represent  a  label 
entry:  the  upper  61  bits  are  the  handle  value,  the  lower  3  bits 
encode  its  level  in  that  label. 

1  Trusted  computing  base 


In  kernel  space,  labels  are  implemented  similarly,  but  the 
array  uses  32-bit  pointers  to  vnodes;  since  those  pointers  are 
8-byte  aligned,  the  lower  3  bits  are  again  available  for  the 
level.  With  reference  counting  and  “copy-on-write”  updates, 
multiple  entities  can  share  the  same  label,  curtailing  memory 
use.  To  minimize  allocation,  small  vnode  arrays  are  stored 
in  the  same  structure  as  the  label  header.  Although  label  op¬ 
erations,  such  as  comparison  and  max,  are  quite  common 
in  Asbestos,  we  have  not  yet  optimized  our  implementation. 
Nevertheless,  Section  8.1  presents  the  results  of  some  micro¬ 
benchmarks. 

5  Label  Save/Restore 

Asbestos  labels  as  described  so  far  apply  at  the  coarse  gran¬ 
ularity  of  entire  processes.  This  makes  sense  for  an  operating 
system:  the  OS  can  control  information  flow  between  pro¬ 
cesses  using  hardware  protection  mechanisms,  such  as  vir¬ 
tual  memory  page  protection,  but  process  internals  are  a  black 
box.  Unfortunately,  such  coarse-grained  labels  would  make  it 
impossible  to  run  untrusted  processes  that  speak  for  differ¬ 
ent  users  at  different  times — an  important  application  cate¬ 
gory  that  includes  most  services.  If  a  process  can  speak  for 
more  than  one  user,  the  OS  must  essentially  trust  it  to  keep 
user  data  internally  isolated.  Alternately,  language-level  flow 
control  mechanisms  can  enforce  access  control  policies  at  a 
much  finer  granularity,  although  communication  with  the  out¬ 
side  world — i.e.,  other  processes — can  be  awkward  to  model. 

This  section  presents  Asbestos’s  label  save/restore  func¬ 
tionality,  which  implements  a  middle  ground.  With  label 
save/restore,  a  single  untrusted  process  can  encompass  any 
number  of  independent  label  spaces.  Information  flow  be¬ 
tween  these  spaces  is  strictly  protected  by  the  operating  sys¬ 
tem,  at  the  granularity  of  memory  pages. 

5.1  Design 

Consider  our  example  of  a  Web  server  running  multiple  inde¬ 
pendent  services,  each  of  which  is  handling  multiple  concur¬ 
rent  connections  for  multiple  users.  We  would  like  to  prevent 
services  from  modifying  one  another’s  data  and  keep  each 
user’s  data  isolated  from  other  users.  Given  Asbestos’s  flex¬ 
ible  labels,  this  would  be  relatively  easy,  if  we  had  one  pro¬ 
cess  per  service-user  pair.  Unfortunately,  OSes  typically  have 
poor  support  for  such  a  large  number  of  processes.  The  goal 
of  label  save/restore  is  to  enforce  the  same  security  guaran¬ 
tees  as  this  fully  forked  model,  but  at  a  lower  cost. 

Our  mechanism  design  was  motivated  by  the  non- 
blocking  event-driven  architecture  that  underlies  many  of  to¬ 
day’s  fast  servers  [17,  29,  35,  37],  Although  some  of  these 
servers  present  the  user  with  a  thread-like  API,  all  of  them  can 
look  like  single -process  event-driven  servers  from  the  OS’s 
perspective.  These  servers  are  built  around  a  simple  schedul¬ 
ing  loop;  roughly: 

while  (1)  { 
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get_next_event ( )  ; 
process_event ()  ; 

} 

Note  that  at  the  end  of  each  iteration,  this  connection  loop 
has  no  stack  or  register  state  that  depends  on  the  event  pro¬ 
cessed  during  that  iteration.  This  is  intentional:  servers  are 
designed  to  handle  multiple  concurrent  and  independent  users 
whose  events  are  interleaved  in  arbitrary  order;  state  from  one 
event  will  likely  not  be  useful  for  the  next.  Thus,  as  long  as 
the  system  preserves  any  changes  to  heap  connection  state,  it 
would  be  safe  to  fork  at  the  beginning  of  each  iteration,  and 
exit  at  the  end.  Assume,  then,  that  we  fork  once  per  iteration. 
The  “next  event” — which,  in  Asbestos,  is  a  message  and  its 
associated  label  contamination  and  decontamination — is  de¬ 
livered  into  the  forked  copy.  Thus,  the  original  process’s  label 
never  changes,  giving  us  exactly  the  label  properties  we  want. 
Each  message  is  delivered  into  a  process  with  pristine  labels, 
and  “subprocesses”  belonging  to  different  users  do  not  con¬ 
taminate  one  another. 

Of  course,  heap  connection  state  must  be  maintained. 
A  subprocess  should  have  write  access  to  its  heap  pages. 
Any  data  belonging  to  other  users  should  be  inaccessible;  in 
fact,  other  subprocesses’  changes  to  VM  state  should  be  in¬ 
visible.  The  shared  stack  can  be  visible — it  is  by  definition 
untainted — but  any  changes  should  be  copy-on-write  (so  the 
subprocess  cannot  leak  information  back  to  the  base  process). 
A  fully-forked  model,  with  one  independent  service  per  user, 
would  of  course  preserve  changes  to  shared  state;  we  are  giv¬ 
ing  up  some  flexibility  by  exiting  each  subprocess  at  the  end 
of  the  event  loop.  However,  this  flexibility  isn’t  critical  for 
most  event-driven  services  since,  again,  they  aim  to  perform 
independently  for  different  users. 

Finally,  there  is  the  matter  of  how  a  subprocess  should  be 
defined.  In  Asbestos,  each  subprocess  is  defined  by  its  send 
label.  To  mark  subprocess  memory,  we  additionally  attach  a 
label  to  each  physical  page  in  the  system.  A  subprocess’s  per¬ 
sistent  heap  space  then  consists  of  those  pages  with  the  same 
label  as  the  subprocess. 

This  subprocess-like  functionality  is  implemented  with 
three  system  calls,  vmjsave,  vm_restore,  and  page_taint,  and 
VM  page  table  manipulations.  The  vmjsave  call  saves  the 
current  process’s  page  table  and  register  state  and  waits  for 
a  message  to  arrive.  When  a  message  arrives,  it  figures  out 
which  subprocess  to  run  by  calculating  the  labels  induced  by 
message  delivery.  It  then  updates  the  process’s  page  tables 
to  give  appropriate  access  to  memory  (read/write,  copy-on- 
write,  or  none)  based  on  page  labels.  When  the  process  calls 
vm_restore,  the  subprocess  page  table  is  thrown  away  and 
the  process  jumps  back  to  the  vmjsave  call.  Finally,  subpro¬ 
cess  memory  is  allocated  by  page_taint,  which  marks  exist¬ 
ing  pages  with  a  given  label. 

To  illustrate,  consider  a  Web  server  with  one  subprocess 
per  user.  The  main  event  loop  would  look  up  connection  state 
information  for  each  incoming  message,  process  that  request. 


then  restore  to  a  pristine  state  to  wait  for  the  next  message.  In 
Asbestos: 

1  vm_save ( &msg) ; 

2  if  (new_state_msg (msg) ) 

3  new_connection_state (msg) ; 

4  else 

5  cxn  =  lookup_cxn_state (msg) ; 

6  process_user_message (msg,  cxn) ; 

7  vm_restore() ; 

The  server  saves  its  state  before  accepting  any  messages,  al¬ 
lowing  it  to  return  to  that  pristine  state.  Notifications  of  new 
users  (line  2)  are  delivered  in  uncontaminated  messages.  This 
allows  the  main  subprocess  to  allocate  and  taint  a  portion  of 
virtual  memory  for  the  new  user.  Without  this  untainted  mes¬ 
sage,  the  user  subprocess  would  not  be  able  to  communicate 
its  existence  to  the  main  subprocess,  and  there  would  be  no 
way  to  prevent  different  users’  connection  state  areas  from 
colliding.  However,  the  uncontaminated  process  only  knows 
which  users  are  active,  and  where  it  has  allocated  memory  for 
them;  it  cannot  access  actual  user  data.  Other  messages  (lines 
5-6)  are  contaminated  by  the  user’s  label,  and  thus  start  the 
user  subprocess.  All  memory  changes  are  transient,  except 
for  changes  to  memory  contaminated  with  the  user’s  handle. 
Data  for  other  users  is  inaccessible,  and  private  user  informa¬ 
tion  cannot  be  exported  to  the  untainted  process  (the  relevant 
pages  are  copy-on-write),  so  any  messages  that  contain  user 
data  will  be  contaminated  with  the  user’s  label. 

5.2  Implementation 

The  vmjsave  call  first  saves  copies  of  the  current  page  ta¬ 
ble,  register  state,  and  labels,  vmjsave  takes  two  arguments, 
a  message  queue  identifier  (that  is,  a  handle)  and  a  message 
pointer;  it  also  saves  these  arguments.  Then  vmjsave  delivers 
a  message  by:  selecting  a  message  from  the  specified  message 
queue;  processing  its  labels,  creating  the  send  label  corre¬ 
sponding  to  the  relevant  subprocess;  setting  up  the  page  table 
appropriately  for  that  subprocess;  delivering  the  message  into 
the  pointer  specified  by  vmjsave’s  caller;  and  returning.  The 
vm_restore  call  throws  away  the  current  page  table,  restores 
register  state  and  labels  to  the  saved  versions,  and  delivers  a 
message,  using  the  same  code  for  delivery  as  vmjsave. 

The  interesting  step  in  this  process  is  setting  up  the  page 
table.  Figure  3  shows  pseudocode  for  the  relevant  function, 
fork_pages. 

The  labeLcmp  function  determines  whether  or  not  a  page 
belongs  to  a  particular  subprocess.  If,  for  some  handle  h,  the 
page’s  label  at  h  is  greater  than  the  subprocess’s  label  at  h, 
then  the  page  belongs  to  a  different  user  and  is  marked  in¬ 
accessible.  Otherwise,  if,  for  some  handle  /;,  the  page’s  label 
at  h  is  less  than  the  subprocess’s  label  at  h,  then  the  page  is 
less  contaminated  than  the  subprocess  and  should  be  marked 
copy-on-write.  Otherwise,  the  labels  are  essentially  equal, 
and  the  subprocess  should  share  a  (possibly  writable)  copy 
with  the  main  process.  Note  that  this  lets  one  subprocess  view 
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any  changes  made  in  strictly-less-contaminated  subprocesses. 
For  instance,  consider  a  save/restore  process  with  saved  la¬ 
bel  Lo  =  {/zO,  1}  and  subprocess  labels  L\  =  {/?  1, 1}  and 
Li  =  {/z  2,1}.  The  most  contaminated  subprocess  L2  will 
be  able  to  see  changes  made  by  both  Lq  and  L\,  although 
its  own  changes  are  invisible  to  them.  Although  a  deviation 
from  the  forked-subprocess  model,  this  is  safe  enough:  since 
To  <  U  <  L2  <  Pr,  the  Lq  and  L\  subprocesses  could  com¬ 
municate  with  Lo  by  sending  it  an  appropriately  contaminated 
message. 

Handle  ownership  (that  is,  handles  with  Ps(h)  =  *)  adds 
further  complications.  A  process  that  owns  a  handle  is  privi¬ 
leged  with  respect  to  that  handle;  as  part  of  that  privilege,  the 
process  may  grant  the  handle  to  others.  In  label  save/restore, 
we  interpret  this  to  mean  that  a  subprocess  that  gets  owner¬ 
ship  of  a  handle  may  grant  that  ownership  to  later  instances 
of  itself.  That  is,  we  want  handles  owned  by  a  subprocess  to 
still  be  owned  when  the  subprocess  wakes  up  again.  Unfortu¬ 
nately,  page  labels  won’t  list  those  owned  handles,  and  a  naive 
comparison  of  send  labels  might  cause  the  system  to  fork  a 
new  copy-on-write  subprocess  each  time  ownership  changes 
(since  the  labels  are  not  exactly  the  same).  Thus,  labeLcmp 
ignores  owned  handles  in  its  comparisons,  and  we  remember 
and  restore  each  subprocess’s  full  set  of  owned  handles. 

The  saved  page  directory  and  page  tables  are  treated  as 
having  page  labels  equal  to  the  saved  label.  Thus,  if  a  de¬ 
livered  message  doesn’t  change  the  process’s  send  label,  the 
“subprocess”  works  directly  on  the  saved  page  table;  any 
changes  it  makes  will  persist  across  the  next  vm_restore. 
However,  any  subprocess  with  a  different  send  label  will  work 
on  a  copy-on-write  version  of  the  page  tables.  It  may  allo¬ 
cate,  free,  and  taint  memory  however  it  likes,  but  its  changes 
will  disappear  when  it  calls  vm_restore,  with  one  safe  excep¬ 
tion.  Any  changes  the  tainted  subprocess  makes  to  portions 
of  virtual  memory  not  mapped  in  the  saved  page  table  may 
be  made  persistent  (as  long  as  the  allocated  pages  are  tagged 
with  the  subprocess’s  send  label).  This  is  because  untainted 
subprocesses  cannot  distinguish  between  unmapped  memory 
and  memory  tagged  with  a  tainted  label.  This  leads  to  sig¬ 
nificant  memory  savings  in  practice.  On  learning  that  a  new 
user  session  is  about  to  start  (which  is  uncontaminated  infor¬ 
mation),  the  subprocess  allocates  a  virtual  memory  region  for 
that  user;  but  it  need  not  allocate  physical  memory  for  that  re¬ 
gion.  Instead,  the  user’s  own  subprocess  will  allocate  physical 
memory  for  it  as  needed. 

Our  current  implementation  of  these  system  calls  is  at  best 
naive.  We  copy  page  tables  more  frequently  than  required, 
walk  the  entire  page  table  on  every  save/restore  (rather  than 
just  the  portions  that  might  be  relevant  to  a  given  subprocess), 
and  flush  the  TLB  with  abandon.  None  of  these  problems  are 
fundamental. 


labeLcmp  (A,  B) 

If  3/z  with  A(h)  >  B(li)  >  *, 

Return  + 1 

Else  if  3/z  with  *  <  A(h)  <  B(h), 

Return  —  1 
Else  return  0 

fork_pages(L) 

If  label_cmp(Lsave,  L)  =  0, 

Share  page  tables 

//  But  continue  to  fix  page  table  permissions 
Else 

Share  page  tables  copy-on-write 
For  each  page  mapping  P , 

(1)  If  label_cmp(P.label,L)  >  0, 

//  Page  belongs  to  a  different  subprocess 
Mark  P  inaccessible 

(2)  Else  if  P  is  read-only. 

Share  P  read-only 

(3)  Else  if  label _cmp(P. label,  L)  <  0, 

//  Page  is  less  contaminated  than  L 
Share  P  copy-on-write 

(4)  Else  if  label  _cmp(P.  label,  L)  =  0, 

//  Page  belongs  to  L’s  subprocess; 

//  our  modifications  should  be  saved. 

//  But  the  saved  version  might  be  COW. 
//  If  so,  copy  it  now,  so  we  can 
//  distinguish  this  case  from  (3). 

If  P  is  copy-on-write,  then  copy  it 
Share  P 


Figure  3:  fork.pages. 

5.3  Discussion 

Reasoning  about  Asbestos  labels,  and  the  more  clearly  safe, 
yet  analogous,  model  of  subprocess  forking,  led  us  to  modify 
the  behavior  of  other  system  calls,  and  our  applications,  to 
preserve  safety.  For  instance,  our  original  vmjsave  design  did 
not  include  message  delivery.  It  simply  restored  the  virtual 
memory  state,  registers,  and  labels  to  their  original  values;  the 
process  would  continue  on  uncontaminated  until  it  executed 
a  normal  recv.  This  opened  a  storage  channel:  the  uncontam¬ 
inated  process  could  discover  both  how  many  contaminated 
messages  were  delivered,  and  their  order  relative  to  any  un¬ 
contaminated  messages.  We  therefore  integrated  vm_restore 
with  the  single  operation  that  might  or  might  not  contaminate 
the  process,  namely  recv,  and  disallowed  the  direct  use  of 
recv  within  a  save/restore  block.  For  the  final  version,  we  will 
further  investigate  this  and  other  system  calls — including,  for 
example,  a  call  to  vmjsave  within  an  existing  save/restore 
block — to  determine  the  maximal  safe  set  of  operations. 
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6  User-Level  Design 

Most  of  the  applications  developed  for  Asbestos  are  services, 
where  careful  consideration  of  information  flow  and  access 
control  was  necessary.  In  particular,  we  ensured  that  if  a  front- 
end  service  was  compromised,  the  effects  would  be  isolated. 

6.1  Fundamental  services 

Identity  server  To  maintain  the  notion  of  users,  an  identity 
service,  idd,  maintains  a  list  of  all  users.  For  each  user,  idd 
maintains  the  username,  password,  user  id,  and  a  taint  handle 
and  grant  handle  for  that  user.  A  user  possessing  the  grant 
handle  speaks  for  that  user.  A  user  possessing  the  taint  handle 
knows  some  private  data  about  that  user. 

“System”  processes  (the  file  server  and  network  server) 
can  obtain  all  grant  and  taint  handles  at  *  from  idd.  This  al¬ 
lows  the  file  server  or  network  server  to  handle  contaminated 
data  without  becoming  contaminated. 

Any  process  can  request  information  about  a  user.  The 
process  must  specify  the  username  and  password  for  the  user 
it  is  interested  in;  if  they  match,  idd  replies  with  the  user  id, 
the  grant  handle  at  *,  and  the  taint  handle  at  3. 

File  server  It  is  necessary  to  have  some  form  of  persistent 
storage  in  Asbestos,  for  programs,  data  files,  etc.  To  do  so, 
the  file  service,  asfs,  maintains  an  on-disk  filesystem.  The  file 
system  uses  the  taint  handles  from  idd  to  provide  multi-user 
access;  when  a  process  accesses  filesystem  data,  it  becomes 
tainted  with  that  user’s  taint  handle. 

6.2  Network  server 

All  access  to  the  network  in  Asbestos  is  through  one  pro¬ 
cess,  netd ,  which  is  responsible  for  interfacing  with  the  TCP 
stack,  managing  network  devices,  and  creating  connections 
for  other  processes.  As  such,  it  has  a  privileged  role  and  must 
properly  apply  restrictions  to  connections  it  creates. 

An  application  can  send  a  message  to  netd  requesting  a 
connection  to  a  remote  host  or  to  listen  for  incoming  connec¬ 
tions.  In  either  case,  netd  performs  the  requested  operation 
and  grants  a  handle  representing  the  new  socket  to  the  appli¬ 
cation  at  *. 

Once  a  process  has  a  handle  to  an  open  connection,  it 
may  perform  READ  and  WRITE  operations  to  transfer  data, 
CONTROL  operations  to  close  the  connection  or  change  the 
low-water  mark,  and  SELECT  operations  to  determine  avail¬ 
able  buffer  space.  On  a  listening  socket,  a  process  may  per¬ 
form  READ  operations  to  accept  incoming  connections  and 
CONTROL  operations  to  close  the  socket. 

In  order  to  apply  labeling  to  network  connections,  netd 
optionally  maintains  a  taint  for  each  connection.  A  process 
may  tell  netd  to  add  a  taint  (a  handle)  to  a  connection.  Later, 
when  netd  sends  a  message  in  response  to  an  operation  on  a 
connection,  it  contaminates  the  recipient  with  the  taint  at  3. 


FIGURE  4:  The  sequence  of  messages  when  processing  a  Web  request. 


6.3  Web  server 

The  Asbestos  Web  server  is  an  alternate  implementation  of 
the  OKWS  design  [17].  In  the  original  OKWS  design,  one 
demultiplexer  accepted  incoming  connections  and  parsed  the 
headers  to  determine  what  service  was  being  requested.  The 
connection  was  then  handed  off  to  a  worker  devoted  to  pro¬ 
viding  that  service.  The  goal  was  to  isolate  services,  so  that 
one  compromised  service  could  not  affect  others. 

The  Asbestos  implementation  of  OKWS  isolates  services 
in  different  workers,  but  also  enforces  user  isolation,  in  order 
to  prevent  one  compromised  service  from  leaking  informa¬ 
tion  about  other  users.  Rather  than  using  a  separate  process 
for  each  user,  OKWS  uses  vm_save  and  vm_restore  to  pro¬ 
vide  full  isolation  of  one  user’s  data  from  others. 

6.3.1  The  launcher 

OKWS  first  starts  launcher ,  which  starts  the  separate 
OKWS  components  and  ensures  proper  communication  priv¬ 
ileges  for  each.  The  launcher  creates  N  worker  verify  handles 
(where  N  is  the  number  of  services),  one  for  each  worker. 
These  are  used  to  verify  that  a  worker  process  is  valid. 

The  launcher  starts  demux  first,  which  grants  its  own  han¬ 
dle  (the  demux  handle )  to  the  launcher.  After  starting  demux, 
launcher  grants  demux  each  of  the  worker  verify  handles. 

Next,  launcher  starts  N  workers,,  granting  each  worker  the 
demux  handle.  This  allows  each  worker  to  contact  demux  to 
announce  that  it  is  ready  to  service  a  request.  The  launcher 
also  grants  each  worker  its  worker  verify  handle. 

6.3.2  The  demux 

After  sending  the  demux  handle  to  launcher,  demux  waits 
for  each  worker  to  contact  it.  When  a  worker  contacts  demux, 
it  passes  its  worker  verify  handle,  to  prove  to  demux  that  it 
is  a  valid  worker.  After  verifying  worker  W,  demux  creates 
an  new  handle  and  sends  it  to  W.  The  worker  later  uses  this 
handle  to  communicate  with  demux. 

This  sequence  of  events  for  handling  connections  is 
shown  in  Figure  4.  The  demux  contacts  netd  and  opens  a  lis¬ 
ten  socket  for  incoming  connections  (message  SI).  When  a 
connection  arrives  (messages  1-2),  demux  reads  enough  of 
the  HTTP  headers  to  determine  what  user  U  is  making  the 
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request,  and  what  worker  W  that  user  is  requesting  (mes¬ 
sages  3-6).  Once  it  has  done  so,  it  contacts  idd  to  obtain  the 
taint  handle  for  that  user,  sending  the  username  and  password 
(message  7),  receiving  U’s  taint  handle  (message  8)  in  re¬ 
sponse.  At  this  point,  the  demux  is  ready  to  hand  the  connec¬ 
tion  off  to  W\  however,  it  must  ensure  that  as  soon  as  W  reads 
from  i/’s  socket,  it  becomes  tainted  with  i/’s  label. 

6.3.3  The  workers 

An  OKWS  system  will  typically  have  many  workers  run¬ 
ning,  each  implementing  a  logically  distinct  Web  service. 
Each  service  sends  a  READ  message  (message  S2)  when  it 
first  starts  up,  expressing  interest  in  incoming  requests.  When 
the  demux  is  ready  to  hand  a  particular  worker  (call  it  W) 
a  connection,  it  simply  replies  to  this  READ  message  (as  in 
message  9).  The  worker  then  immediately  can  reply  with  an¬ 
other  READ  message  (as  in  S2),  since  it  is  capable  of  serving 
overlapping  clients. 

The  handoff  shown  in  messages  9  through  11  requires 
care.  Each  worker  maintains  server-side  state  for  each  ac¬ 
tive  user  it  is  communicating  with.  This  state  includes  receive 
buffers,  send  buffers,  and  perhaps  cached  session  information 
that  might  persist  over  multiple  HTTP  sessions.  In  our  im¬ 
plementation,  each  worker  W  sets  aside  a  64-page  region  for 
each  user  U  that  becomes  active,  and  it  allocates  pages  there 
lazily,  as  the  user  requires  them.  Moreover,  it  taints  this  entire 
region  with  U’s  taint  handle,  so  that  it  might  later  write  to  it 
when  it  has  U’s  taint  handle  in  its  send  label. 

If  the  demux  were  to  deliver  t/’s  connection  tainted  with 
U  immediately  in  message  9,  the  worker  would  be  at  a  loss.  It 
would  have  to  set  aside  a  region  for  U’s  state,  but  it  could  not 
write  to  any  persistent,  non-tainted  memory  to  indicate  that 
it  had  done  so.  OKWS  on  Asbestos  instead  uses  a  two-phase 
handoff  protocol.  In  message  9,  the  demux  informs  W  that  it 
is  about  to  deliver  a  connection  for  user  U,  but  does  not  taint 
W  as  it  does  so.  W  then  consults  a  table  T  (implemented  as 
a  quadratic  hash  table),  either  finding  a  previously  allocated 
region  for  U ,  or  allocating  a  new  one.  Note  that  T  resides 
in  uncontaminated  memory.  When  W  writes  to  T  that  it  has 
allocated  a  region  for  the  user  U,  it  can  later  read  this  mapping 
without  becoming  contaminated. 

After  noting  this  new  region  assignment  in  T.  W  is  ready 
to  accept  the  connection  and  the  taint,  and  it  does  so  in  mes¬ 
sage  10  and  11.  Once  it  has  received  U’s  connection,  it  enjoys 
a  reserved,  pretainted  region  of  pages,  to  which  it  can  write 
persistently  when  contaminated  by  U’s  handle.  The  worker 
then  services  U’s  connection,  parsing  the  request,  sending 
the  reply  back  to  the  client  over  the  connection  (messages 
12-15),  and  closing  the  connection  (messages  16-17).  Once 
complete,  worker  W  calls  vm_restore  and  waits  to  service 
another  connection. 

One  interesting  issue  remains:  freeing  memory.  When 
worker  W  corresponds  with  U  over  many  HTTP  requests,  it 
can  grow  and  shrink  U’s  region  while  tainted,  always  leav¬ 


ing  at  least  one  page  allocated  to  store  a  map  of  the  region. 
When  the  user  U  explicitly  logs  off,  W  would  like  to  reclaim 
the  last  page,  and  reassign  U’s  region  to  other  users  who  be¬ 
come  active.  To  achieve  this,  W  sends  a  message  to  the  de¬ 
mux,  informing  it  that  U’s  region  should  now  be  available  to 
other  users.  The  demux  immediately  sends  back  an  acknowl¬ 
edgment,  telling  W  to  free  the  last  page  in  the  region.  The 
demux  then  waits  a  random  amount  of  time,  on  the  order  of 
ten  minutes,  and  then  sends  a  second  uncontaminated  mes¬ 
sage,  telling  W  to  mark  U’s  region  unallocated  in  the  table 
T r  The  region  is  now  available  for  reassignment  to  a  differ¬ 
ent  user.  Note  the  demux  must  be  careful  to  synchronize  this 
last  messages  with  potential  requests  from  U,  so  as  to  avoid 
race  conditions  on  W. 

7  Covert  Channels 

Asbestos  labels  prevent  processes  from  explicitly  transmit¬ 
ting  sensitive  information  to  unauthorized  parties.  However, 
supposedly  isolated  processes  can  still  communicate  infor¬ 
mation  through  covert  channels.  Our  goal  is  not  to  eliminate 
covert  channels — an  impossible  task — just  to  make  it  signif¬ 
icantly  harder  to  leak  information  than  on  systems  used  as 
Internet  servers  today.  While  high-grade  military  systems  are 
required  to  quantify  all  covert  channels,  for  Asbestos  we  con¬ 
tent  ourselves  with  enumerating  the  channels. 

Broadly  speaking,  covert  channels  can  be  categorized  as 
either  timing  or  storage  channels.  Timing  channels  consist 
of  attacks  in  which  process  A  conveys  information  to  B  by 
modulating  its  use  of  system  resources  in  a  way  that  observ¬ 
ably  affects  B’s  response  time.  For  instance,  A  might  flush 
the  processor  cache  or  cause  the  disk  arm  to  be  moved  farther 
from  a  subsequent  request.  We  are  less  concerned  with  timing 
channels,  as  to  some  extent  they  can  be  mitigated  by  limiting 
processes’  ability  to  measure  time  precisely  [10].  (Asbestos 
offers  no  such  feature,  however,  and  the  problem  admittedly 
becomes  harder  in  the  presence  of  network  communication.) 

Storage  channels  are  caused  by  any  state  that  can  be  mod¬ 
ified  by  process  A  and  observed  by  B  when  A  is  not  supposed 
to  transmit  information  to  B.  It  was  a  goal  to  avoid  storage 
channels  that  could  be  exploited  within  a  single  process,  so 
that  at  least  two  cooperating  processes  are  required  to  com¬ 
municate  information  in  violation  of  a  label  policy. 

The  Asbestos  design  contains  two  inherent  storage  chan¬ 
nels:  the  program  counter,  and  labels.  The  vm_restore  system 
call  affects  the  program  counter  of  an  untainted  process  by 
restarting  a  process  at  its  save  point  with  a  lower  send  label. 
Two  cooperating  processes  can,  for  instance,  transmit  a  bit  of 
information  by  the  order  in  which  they  call  vm_restore.  This 
channel  is  roughly  equivalent  to  the  covert  channel  intention¬ 
ally  included  by  the  drop-on-exec  feature  of  IX  [23], 

The  send  system  call  potentially  raises  the  value  of  the 

2demux’s  response  represents  a  storage  channel,  which  we  mitigate  by  a 
long  delay. 


21 


recipient’s  send  label  to  an  unanticipated  value.  This  is  also 
a  storage  channel,  as  labels  can  be  observed  through  lack  of 
communication.  Consider  a  tainted  process  A  attempting  to 
communicate  a  bit  of  sensitive  information  to  an  untainted 
process  C.  An  attacker  might  construct  two  untainted  pro¬ 
cesses,  Bq  and  B\,  both  of  which  repeatedly  send  heartbeat 
messages  to  C.  By  sending  a  message  that  contaminates  pro¬ 
cess  Bj,  A  can  communicate  the  value  i  to  C.  Such  storage 
channels  are  inherent  to  any  system  with  run-time  checking 
of  dynamic  labels  [21]. 

Both  of  the  above  channels  require  at  least  two  pro¬ 
cesses,  which  means  they  can  be  mitigated  by  restricting  ac¬ 
cess  to  the  fork  device.  This  illustrates  one  advantage  of  the 
vmjsave/vm_restore  page  labeling  approach,  compared  to  a 
more  traditional  one-label-per-process  architecture.  Page  la¬ 
beling  reduces  concurrency,  thereby  also  reducing  the  num¬ 
ber  of  send  labels  and  program  counters  available  as  storage 
channels  at  any  given  time. 

Other  Asbestos  kernel  data  structures  have  been  carefully 
designed  to  avoid  exploitable  storage  channels.  Handles  are 
generated  by  incrementing  a  61 -bit  counter,  which  is  a  stor¬ 
age  channel.  However,  since  the  kernel  encrypts  the  counter 
value  with  a  61 -bit  block  cipher  to  produce  handles,  the  user- 
visible  sequence  of  handles  is  unpredictable  and  thus  cannot 
convey  information.  The  VM  system  prevents  page  tables  and 
page  labels  from  being  used  as  storage  channels  by  ensuring 
changes  made  while  tainted  are  not  visible  with  a  lower  send 
label. 

The  current  implementation  still  has  several  other  storage 
channels  we  intend  to  close  or  limit,  but  we  believe  these  can 
be  mitigated  without  affecting  the  claims  of  the  paper.  For 
example.  Asbestos  does  not  yet  deal  gracefully  with  certain 
forms  of  resource  exhaustion.  Also,  when  a  process  sends  a 
message  to  an  invalid  handle,  or  to  another  process  whose 
label  prevents  it  from  replying,  we  intend  for  send  always  to 
return  the  error  E_MAYBE.  The  implementation  does  not  yet 
do  this  uniformly. 

8  Evaluation 

In  this  section,  we  present  micro-benchmarks  of  the  Asbestos 
kernel  and  an  end-to-end  analysis  of  the  performance  of  the 
services  running  on  top  of  it.  We  identify  areas  where  perfor¬ 
mance  becomes  a  bottleneck  and  further  work  is  necessary. 

8.1  Micro-benchmarks 

Most  Asbestos  operations  involve  label  operations.  In  par¬ 
ticular,  all  messages  exchanged  in  the  system  require  label 
checks  and  potential  label  modifications,  and  page  labeling 
makes  extensive  use  of  label  operations  during  save/restore 
sessions.  To  evaluate  the  performance  of  Asbestos  we  need  to 
quantify  the  cost  of  label  operations.  We  conducted  a  set  of 
micro-benchmarks  that  exercise  seven  basic  label  operations 
that  are  considered  most  common  and  potentially  costly. 


Operation 

size  =  50 

size  =  100 

size  =  200 

labeLcreate 

1510 

1530 

1531 

labeljnin 

4198 

5511 

8078 

labeLmax 

5521 

5561 

7644 

labelJeq 

586 

1028 

2021 

labeLcmp 

540 

1040 

2130 

labeLadd 

179 

199 

223 

labeLadd  (COW) 

188 

223 

246 

Figure  5:  Label  operations  average  cost  measured  in  cycles. 

The  functions  evaluated  are  the  following: 

labeLcreate  :  create  a  label  of  given  size, 
labeljnin  and  label jnax  :  generate  a  label  by  applying  the 
min  and  max  operators  shown  in  Figure  1  to  its  arguments3, 
label_add  :  add  a  handle  to  a  label, 

label_add  (COW)  :  add  a  handle  to  a  label  triggering  the 
copy-on-write  code, 

labelJeq  :  “less  than  or  equal”  operator  (see  Figure  1), 
labeLcmp  :  label  comparison  operation  returning  “less  than,” 
“equal,”  and  “greater  than”  values. 

All  measurements  where  taken  on  a  PC  equipped  with  a 
2.8GHz  Pentium4  processor  with  1MB  L2  cache  and  1GB  of 
RAM,  running  Asbestos.  All  operations  were  measured  using 
labels  with  50,  100  and  200  handles.  The  numbers  presented 
are  averages  for  3  runs  of  100  iterations  each.  Figure  5  shows 
the  results  of  all  seven  micro-benchmarks. 

labeLcreate  performance  is  dominated  by  the  (constant) 
memory  allocation  cost.  The  average  cycles  for  labeljnin 
and  label  jnax  are  dominated  by  a  particular  test  case  whose 
cost  was  an  order  of  magnitude  higher  than  that  of  all  other 
test  cases.  That  happened  mainly  due  to  memory  allocation 
and  copying  involved  in  that  test.  All  other  investigated  sce¬ 
narios  showed  that  min  and  max  operations  scale  linearly 
with  the  size  of  the  labels,  as  did  labelJeq  and  labeLcmp, 
whose  behavior  was  very  stable  with  no  odd  results.  Both 
label_add  operations’  results  showed  that  the  size  of  the  la¬ 
bel  does  not  affect  results  significantly,  mainly  because  the 
cost  is  dominated  by  the  time  spent  sorting  the  array  of  label 
components.  Triggering  “copy -on-write”  has  a  minimal  per¬ 
formance  hit  since  our  current  label  implementation  uses  a  la¬ 
bel  allocation  “arena”  (implemented  as  a  free-list)  that  speeds 
up  label  duplication. 

Micro-benchmark  results  revealed  certain  points  that 
could  be  optimized,  but  in  general  we  were  able  to  show  that, 
using  our  untuned  label  implementation,  the  average  cost  of 
operations  is  reasonable  and  scales  well  with  the  size  of  the 
labels  involved. 

8.2  End-to-End  Measurements 

As  a  proof  of  concept,  we  implemented  and  measured  a 
version  of  OKWS  running  on  Asbestos.  We  did  not  expect 

3Note  that  these  operations  alter  their  first  operand,  e.g.  label  jnin(a,  b) 
is  equivalent  to  a  =  label  jnin(a,  b) 
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Module 

Percent  Execute  Time 

Kernel  -  misc 

2.70 

Kernel  -  memory  mgmt 

6.34 

Kernel  -  vm-restore 

28.26 

Kernel  -  IPC 

12.53 

Kernel  -  Network 

18.47 

Console 

6.47 

User  -  Network 

16.85 

User  -  okws-demux 

5.60 

User  -  okws-worker 

2.77 

FIGURE  6:  Execution  time  of  various  modules  while  OKWS  is  under  heavy 
load. 


Time  to  complete  request  (us) 

FIGURE  7:  CDF  of  request  time  as  seen  from  the  client. 

our  system’s  latencies  and  throughput  to  be  competitive  with 
highly  optimized  operating  systems  or  Web  servers,  but  we 
did  expect  reasonable  baseline  performance  numbers  that  we 
could  improve  upon  as  development  continues.  Performance 
is  discussed  in  Section  8.2.1.  More  to  the  point,  we  did  ex¬ 
pect  OKWS  on  Asbestos  to  be  reasonably  efficient  in  terms 
of  memory  utilization,  since  an  additional  page  of  memory  is 
potentially  all  that  is  required  to  allocate  an  additional  protec¬ 
tion  domain.  An  exploration  of  memory  usage  in  OKWS  can 
be  found  in  Section  8.2.2. 

8.2. 1  Web  Server  Speed 

As  shown  in  Figure  6,  performance  is  largely  limited  by 
time  spent  in  the  kernel  and  on  networking.  Over  a  third  of 
the  total  CPU  time  is  spent  on  networking. 

Our  baseline  measurements  were  all  taken  on  a  10Mb  Eth¬ 
ernet  network  with  a  local  Linux  client  generating  requests. 
The  server  is  an  AMD  Athlon  1400  using  64MB  of  memory. 
Clearly  support  for  faster  Ethernet  drivers  and  more  mem¬ 
ory  is  critical  for  gaining  more  competitive  performance  num¬ 
bers. 

Due  to  bugs  in  our  TCP/IP  implementation,  concurrent 
connections  lead  to  timeouts  and  did  not  offer  a  performance 
gain.  Figure  7  shows  the  overhead  of  page  labeling.  Our 
server  is  able  to  serve  over  210  connections  per  second.  With 
page  labeling  disabled,  it  is  possible  to  serve  over  240  connec¬ 
tions  per  second.  Page  labeling  currently  adds  about  500  ps 
of  CPU  time  to  each  connection,  primarily  in  vm_restore.  A 
modern  web  server  should  be  able  to  handle  thousands  of  con¬ 
nections  per  second,  and  the  time  in  vm_restore  alone  cur¬ 


rently  limits  Asbestos/OKWS  to  2000  connections  per  sec¬ 
ond.  As  our  system  matures,  it  will  be  critical  to  optimize 

vm_restore. 

8.2.2  Memory  Usage 

In  Section  5,  we  argued  the  merits  of  page  labeling  over 
the  more  traditional  fork-accept  designs.  Our  measurements 
of  the  Asbestos  prototype’s  memory  utilization  lend  credence 
to  this  claim.  A  minimal  OKWS  worker  process  requires  at 
least  20  pages  of  physical  memory  (each  4KB),  before  it  has 
even  served  an  HTTP  request:  one  page  for  the  page  direc¬ 
tory,  about  four  for  page  tables,  one  for  a  user  stack,  one  for 
a  user  exception  stack,  and  others  for  the  BSS  and  user  heap. 
Other  pages  are  artifacts  of  our  kernel  and  application  imple¬ 
mentation,  though  an  operating  system  that  uses  fewer  than 
five  pages  per  process  is  difficult  to  imagine. 

The  reasonable  conclusion  to  draw  is  that  to  support  n 
users  in  the  fork-accept  model,  OKWS  would  require  approx¬ 
imately  20 n  memory  pages  at  a  minimum,  a  cost  which  will 
become  prohibitive  as  n  grows  large.  By  contrast,  page  label¬ 
ing  for  simple  Web  services  achieves  the  lower  bound  that  we 
proposed  earlier:  one  memory  page  per  external  client  served. 

To  experimentally  verify  this  claim,  we  configured  a  test 
client  to  simulate  100  Web  requests  on  behalf  n  different 
users,  as  n  grew  from  1  to  25.  For  each  run,  we  captured  the 
maximum  number  of  pages  active  at  any  one  given  time.  For 
instance,  at  n  =  1,  we  measured  1 146  total  pages  in  use  as  the 
Web  server  launched  and  a  maximum  of  1264  pages  in  use  as 
the  100  serialized  requests  were  made,  all  on  behalf  of  the 
same  user.  At  n  =  2,  the  maximum  number  of  pages  in  use 
increases  to  1265.  These  patterns  increases  roughly  linearly 
until  n  =  25,  at  which  point  the  maximum  number  of  pages 
in  use  is  1288.  Although  we  can  improve  upon  absolute  mem¬ 
ory  usage,  the  overall  trend  is  encouraging:  for  some  constant 
C,  OKWS  on  Asbestos  should  support  n  concurrently  active 
users  with  only  n  +  C  pages. 

9  Conclusion 

Asbestos  is  an  operating  system  that  makes  nondiscretionary 
access  control  mechanisms  available  to  unprivileged  users, 
giving  them  fine-grained,  end-to-end  control  over  the  dissem¬ 
ination  of  information.  Asbestos  provides  protection  through 
a  new  labeling  scheme,  which,  unlike  schemes  in  previous  op¬ 
erating  systems,  allows  data  to  be  sanitized  (or  “untainted”) 
by  individual  users  within  categories  they  control.  The  cat¬ 
egories,  called  handles,  use  the  same  names  as  communica¬ 
tion  endpoints,  making  them  a  kind  of  generalization  of  ca¬ 
pabilities.  Like  capabilities,  processes  can  dynamically  gen¬ 
erate  new  handles,  handle  ownership  is  aggregated  by  pro¬ 
cess  (allowing  explicit  enumeration  of  privileges),  and  pro¬ 
cesses  specify  temporarily  label  restrictions  on  sent  messages 
to  avoid  the  unintentional  use  of  privilege. 

The  Asbestos  virtual  memory  system  allows  labels  to  be 
applied  at  the  granularity  of  individual  pages,  so  that  one  can 
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control  the  flow  of  information  even  within  one  process.  A 
prototype  web  server  handles  labeled  data  in  such  a  way  that 
even  software  bugs  cannot  cause  one  user  to  receive  another’s 
private  data.  The  system  requires  only  one  page  of  memory 
per  active  user,  and  exhibits  a  tolerable  slowdown  of  only 
12%  for  the  vastly  increased  security  of  fine-grained  infor¬ 
mation  flow  control. 
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